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THE EVALUATION OF CERTAIN FACTORS FOR PREDICTING 
THE SUCCESS OF STUDENTS ENTERING THE COLLEGE 
OF PHARMACY OF THE UNIVERSITY OF MINNESOTA 

FROM 1933 THROUGH 1943* 
MArjJorIE E. Moore 


The Meridian Hill 
Washington, D. C. 


The general problem of this investigation 
was the evaluation of certain factors for esti- 
mating the future performance of individuals 
applying for admission to or actually entering 
the College of Pharmacy of the University of 
Minnesota. In other words, the problem was 
to determine how to use best the knowledge 
of a number of characteristics of students 
entering the College of Pharmacy in predict- 
ing their future success as measured by cer- 
tain selected criteria. The main purpose was 
to develop prediction formulas and to set up 
methods for interpreting predictions made 
from them, which might be used by advisers 
of these individuals for purposes of guidance. 


THE PLAN OF THE INVESTIGATION 


The broad general plan for this investiga- 
tion was to administer to each new group of 
students entering the College of Pharmacy a 
battery of tests thought to be indicative of 
success in that college, to collect additional 
data on these students relative to past and 
present scholastic achievement, and to deter- 
mine the efficacy of the different predictive 
variables on the basis of a statistical analysis 
of the data.” 

This study was based on new students 
entering the College of Pharmacy from the 
fall quarter of 1933 through the fall quarter 
of 1943, and was divided into two major sec- 
tions. The first part, the Preliminary Studies, 
was based on the new students who entered 


*A summary of a thesis submitted to the faculty of the 
Graduate School of the University of Minnesota in pastia! 
| ere of the requirements for the of Doctor of 

Philosophy, —_ 1945. Dr. Palmer 3" Johnson, Adviser. 

1 Based on data y. — F. "> by and with the ration 
of the Board of Ad —y-> dn ey Counseling Bureau, 
the Office of Adminiens end the College of Phar- 
Macy, and a Committee on Ses Research of the 
University of Minnesota. 


in the fall quarters of 1933, 1934, 1935, 1936, 
and 1937. The second part, the Major 
Studies, was based on the new students who 
entered the College of Pharmacy from the fall 
quarter of 1938 through the fall quarter of 
1941; students who entered from 1933 
through 1943 were, however, used in certain 
sections of the Major Studies. 


Almost all of the statistical techniques 
which were used in this investigation are 
widely known among statisticians and re- 
search workers; some of them, however, have 
not as yet been used extensively. Wherever 
a measure of central tendency was required, 
the arithmetic mean was used; both the 
standard deviation and the variance were 
used as measures of variability. Throughout 
this study it was necessary to test for homo- 
geneity of the basic data for two or more 
groups. When it was necessary to test for the 
homogeneity of the data for only two groups, 
the ¢ test was used to test the significance of 
the difference between means, and the F test 
was used to determine the significance of the 
difference between variances. When it was 
necessary to determine the homogeneity of 
more than two groups, the analysis of vari- 
ance technique was used to test for the 
homogeneity of the means, and the Welch- 
Nayer (Z, test) was used to test for the 
homogeneity of the variances. Zero-order and 
multiple correlation and regression were used 
extensively in the study, as well as the related 
appropriate tests of significance. Standard 
partial regression coefficients and their stand- 
ard errors were used to determine the relative 
value of the different predictive variables in 
estimating the various criteria of success. The 
assumptions of linearity of regression and 
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homoscedasticity, which underlie the statis- 
tical methods of correlation and regression 
used in this study, were tested by means of 
the analysis of variance technique and the 
Welch—Nayer technique. Prediction formulas 
were obtained by converting standard partial 
regression equations (relative deviate score 
form) into partial regression equations (raw 
score form). The “standard error of estimate” 
was used as one of the methods for interpret- 
ing values predicted from the regression equa- 
tions. The standard error of a specific pre- 
dicted mean value was used to determine the 
reliability, from the standpoint of errors of 
random sampling, of a specific predicted mean 
value for an individual. The formula for the 
standard error of a mean value (Y) predicted 
from three independent variables, in the form 
used in this study, is as follows:* 
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centile rank, and the pre-professional honor 
point ratio were also considered as predictive 
variables. Some of the predictive variables 
were studied for only one entering group of 
students, some for two entering groups, and 
others for three or more entering groups; 
many of the predictive variables were consid- 
ered for as many as ten different entering 
groups of students. Several different criteria 
of success were used to evaluate the efficiency 
of the different variables for predicting the 
success of students entering the College of 
Pharmacy. In general, the various criteria 
were as follows: the first quarter first year 
honor point ratio, the full first year honor 
point ratio, the first year honor point ratio 
(based on one, two, or three quarters of 
course work), the total honor point ratio 
(based on one to twelve or more quarters of 
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The standard error of a predicted mean value 
(¥) as determined for a specified set of x 
values (independent or predictive variable 
values) is an estimate of the standard devia- 
tion of the random sampling distribution of 
Y for the specified set of x values. The stand- 
ard error of a predicted mean value, there- 
fore, provides a measure of the accuracy of 
estimates made from a regression plane based 
on a sample from the standpoint of the 
sampling errors of that plane in relation to 
the true regression plane. 

A large number of different predictive vari- 
ables were considered in this investigation. 
Included in the test batteries were mathe- 
matics tests, science survey tests, chemistry 
tests, a pharmacy problem solving test, and 
other tests of a more general nature such as 
the Minnesota Reading Examination, the 
Peterson Uniform Test of Intelligence, the 
Pressey Senior Classification Test, and the 
Wesley College Test in Social Terms. The 
high school average mark, the high school 
science average mark, the high school per- 

® Palmer O. Johnson, University of Minnesvta, unpublished 
manuscript. 
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course work), the total honor point ratio for 
those earning a degree, the first quarter first 
year honor point ratio for those earning a 
degree, the score on the State Board Exami- 
nation—Excluding the Practical Test, and the 
score on the State Board Examination—lIn- 
cluding the Practical Test. For students who 
entered the College of Pharmacy as freshmen, 
the first year was the freshman year, but for 
students who entered the College of Phar- 
macy with advanced standing as sophomores, 
the first year was the sophomore year. Some 
variations of these criteria were used in a sub- 
sidiary study made on freshmen who con- 
tinued in the College of Pharmacy long 
enough to become sophomores; for example, 
the first quarter second year honor point 
ratio, the full second year honor point ratio, 
and the second year honor point ratio were 
used for this subsidiary study. 

. THE PRELIMINARY STUDIES 

There were seven preliminary studies made 
on students who entered the College of Phar- 


macy as freshmen or as sophomores from the 
fall quarter of 1933 through the fall quarter 
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of 1937. The studies were as follows: the 
1933-34 group (N = 31), freshmen and 
sophomores combined; the 1934-35 group 
(N = 42), freshmen and sophomores com- 
bined; the 1935-36 group (NV = 49), fresh- 
men and sophomores combined; the 1936-37 
group (N == 42), freshmen and sophomores 
combined; the 1935-1936 freshman group 
(N = 40), freshmen for 1935 and 1936 com- 
bined; the 1935-1936 sophomore group 
(N = 51), sophomores for 1935 and 1936 
combined; and the 1937-38 group, consist- 
ing of 21 freshmen and 22 sophomores. 

These Preliminary Studies represent the 
exploratory phases of the investigation, and 
all of the different predictive variables which 
were mentioned in the previous section were 
considered for one or more of the groups in 
order to determine their value for predicting 
the success of students entering the College 
of Pharmacy. Although the relative predictive 
value of the different variables varied for the 
different criteria considered and for the vari- 
ous groups studied, some rather definite trends 
were revealed, which are summarized briefly 
in the next few paragraphs. 

The Fractions Test, the Minnesota Read- 
ing Examination, the Peterson Uniform Test 
of Intelligence, the Pressey Senior Classifica- 
tion Test, the Wesley College Test in Social 
Terms, the Pharmacy Problem Solving Test, 
and the Strong Vocational Interest Blank, 
which were all considered for one or more 
groups, in general, were found to have, on the 
basis of the statistical analyses, relatively low 
predictive value in relation to the other vari- 
ables which were studied. 

The Pharmacy Mathematics Test, Test I 
and Test II (same as Test I with eight addi- 
tional items); the Johnson Science Survey 
Test, Test I, Test II (a shortened form of 
Test I), and Test III (Test II, designated 
as Section A and a new part, Section B); cer- 
tain sections of the Iowa Chemistry Aptitude 
Test; the high school average mark; the high 
school percentile rank; and the pre-profes- 
sional honor point ratio all were found in 
general to have relatively greater predictive 
value than the other variables considered. 

These exploratory studies also furnished 
considerable evidence to indicate that it was 
necessary to make separate independent 
studies on the students who entered the Col- 
lege of Pharmacy as freshmen and on those 
who entered the College of Pharmacy as soph- 
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omores. The relative value of the different 
predictive variables for the 1935-1936 fresh- 
man group was indicated to be slightly dif- 
ferent from the relative value of the different 
predictive variables for the 1935-1936 soph- 
omore group. In general, the Johnson Science 
Survey Test III-A, the Pharmacy Mathe- 
matics Test II, and the high school average 
mark were found to be the most valuable pre- 
dictive variables for the 1935-1936 freshman 
group. For the 1935-1936 sophomore group, 
the pre-professional honor point ratio and the 
high school average mark were indicated to 
be the most valuable variables, with the 
selected sections of the Iowa Chemistry Apti- 
tude Test probably being more valuable than 
the other remaining variables. In the 1937-38 
study it was found that the basic data for the 
students who entered as freshmen and for 
those who entered as sophomores were not 
sufficiently homogeneous to justify combining 
them into one group. 

Although the number of cases for each of 
the exploratory studies was small and the 
findings varied considerably from one study 
to another so that inferences drawn from the 
findings of any one study would probably be 
misleading, the results of these studies taken 
as a whole appeared to warrant the following 
statement: The relative predictive value of 
the Johnson Science Survey Test III, the 
Pharmacy Mathematics Test II, Parts 2 and 
3 of the Iowa Chemistry Aptitude Test, the 
Pharmacy Problem Solving Test, the high 
school average mark (or the high school per- 
centile rank), and the pre-professional honor 
point ratio appeared to be high enough to 
justify their further consideration on larger 
groups in independent studies of entering 
freshman and sophomore students. 


THE Major Stupies 


There were two major studies in this inves- 
tigation. The first one, which was based on 
students who entered the College of Pharmacy 
as freshmen, will be presented in some detail; 
the second one, based on students who entered 
the College of Pharmacy as sophomores, was 
similar to the freshman study and will be 
summarized more briefly. 

The Freshman Study—tThe first part of 
this study, which was concerned with the 
determination of the relative value of the dif- 
ferent variables for predicting certain selected 
criteria of success, was based on students 
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entering the College of Pharmacy as freshmen 
from the fall quarter of 1938 through the fall 
quarter of 1941. The second part of the 
study, which was concerned with the practical 
application of the findings of the first part, 
was based on students who entered the Col- 
lege of Pharmacy as freshmen from the fall 
quarter of 1933 through the fall quarter of 
1943. 

There were 155 students who entered the 
College of Pharmacy as freshmen in the fall 
quarters from 1938 through 1941, who had 
complete data on the predictive variables, 
and who completed at least the first quarter 
of the freshman year. Thirty-five of these 
students entered in the fall of 1938, 48 in the 
fall of 1939, 42 in the fall of 1940, and 30 
in the fall of 1941. Before these four groups 
could be combined into one group for anal- 
ysis, it was necessary to test for the homo- 
geneity of the basic data. The following pre- 
dictive variables were considered in this 


study: the Pharmacy Mathematics Test II; 
the Johnson Science Survey Test III, Sections 
A and B; the Pharmacy Problem Solving 
Test; the Iowa Chemistry Aptitude Test, 
Part 2 (all three paragraphs, each timed 
separately) and Part 3; and the high school 


percentile rank. The criteria of success were 
those that were mentioned earlier in this 
report. 

The tests of homogeneity provided prac- 
tically no evidence to indicate that the four 
different entering groups were not random 
samples from the same population. The anal- 
ysis of variance, which was used to test for 
the homogeneity of the means of the four 
groups, indicated that on all except one of the 
predictive variables and on all of the criteria 
the observed differences between the means 
for the four entering groups could be attrib- 
uted to errors of random sampling; in all of 
the comparisons made, only that for the mean 
scores on the Johnson Science Survey Test 
III-A yielded a probability as small as .o1. 
The tests for homogeneity of variances (the 
Welch—Nayer, L, test) indicated that the four 
groups were homogeneous as to variability on 
the different predictive variables and the 
various criteria, all of the probabilities being 
greater than .or. 

_ The results of the correlation and regres- 
sion analyses, summarized in the next few 
paragraphs, indicated that the Johnson Sci- 
ence Survey Test III, the Pharmacy Mathe- 
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matics Test II, and the high school percentile 
rank had greater relative predictive value 
than did the other variables considered for 
predicting the success of this group of enter- 
ing freshmen. 


The zero-order correlation coefficients be- 
tween the different predictive variables and 
the various criteria for this group of students 
(Table I) indicated that when the variables 
were considered as single predictors, the high 
school percentile rank, the Johnson Science 
Survey Test III-Total, the Pharmacy Mathe- 
matics Test II, the Johnson Science Survey 
Test III-A, and the Johnson Science Survey 
Test III-B were more valuable for predicting 
the various criteria than were the different 
sections of the Iowa Chemistry Aptitude Test 
or the Pharmacy Problem Solving Test. The 
relative value of the different predictive vari- 
ables was much the same when each of the 
variables was combined with the high school 
percentile rank for two-variable multiple cor- 
relation coefficients with the different criteria, 
as it was when the variables were considered 
as individual predictors (Table II). The 
Johnson Science Survey Test III-Total, the 
Pharmacy Mathematics Test II, the Johnson 
Science Survey Test III-A, and the Johnson 
Science Survey Test III-B each combined 
with the high school percentile rank, in gen- 
eral, yielded higher two-variable multiple 
correlation coefficients for the various criteria 
than did any of the other variables combined 
with the high school percentile rank. 

The high school percentile rank, the Phar- 
macy Mathematics Test II, the Johnson Sci- 
ence Survey Test III-A, and the Johnson 
Science Survey Test III-B were found, in 
general, to be more valuable than the Iowa 
Chemistry Aptitude Test, Part 2 and the 
Iowa Chemistry Aptitude Test, Part 3 when 
these six variables were studied in a six- 
variable multiple combination for predicting 
each of the various criteria considered for 
this group of students who entered the Col- 
lege of Pharmacy as freshmen. The high 
school percentile rank and the Pharmacy 
Mathematics Test II had standard partial 
regression coefficients for predicting each of 
the different criteria that were large enough 
to be significantly different from zero, most 
of them being significant on the 1 per cent 
level and the remaining ones being significant 
on the 5 per cent level. The Johnson Science 
Survey Test III-A and the Johnson Science 
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Survey Test III-B had standard partial re- 
gression coefficients which varied more in size 
from one criterion to another than did those 
for the Pharmacy Mathematics Test II and 
those for the high school percentile rank, a 
few of them being large enough to be signifi- 
cant on the 1 per cent level, some of them 
being large enough to be significant on the 
5 per cent level, and others being too small 
to be significantly different from zero. The 
other two variables in this six-variable com- 
bination, Part 2 and Part 3 of the Iowa 
Chemistry Aptitude Test, were found to have 
standard partial regression coefficients which 
were too small to be significantly different 
from zero for predicting all of the various 
criteria considered. 


When the high school percentile rank, the 
Pharmacy Mathematics Test II, and the 
Johnson Science Survey Test III-Total (Sec- 
tion A + Section B) were studied in a 
three-variable multiple regression equation 
for predicting each of the various criteria, it 
was found that all three variables were valu- 
able for predicting most of the honor point 
ratio criteria; however, only the high school 
percentile rank and the Pharmacy Mathe- 
matics Test II were found to be valuable for 
predicting the two State Board Examination 
criteria (Table III). 


The high school percentile rank, the John- 
son Science Survey Test III-Total, and the 
Pharmacy Mathematics Test II all were 
found to have standard partial regression 
coefficients which were significant on the 1 per 
cent level for predicting four of the honor 
point ratio criteria, and the standard partial 
regression coefficients or betas for each of 
these three variables were approximately the 
same for the four criteria. The betas for the 
high school percentile rank were .3575, .4108, 
3965, and .4130 respectively, for the first 
quarter first year honor point ratio, the full 
first year honor point ratio, the first year 
honor point ratio, and the total honor point 
ratio; the betas for the Johnson Science Sur- 
vey Test III-Total were .3249, .2656, .2600, 
and .2362 respectively, for the same four cri- 
teria; and the betas for the Pharmacy Mathe- 
matics Test II were .1916, .21%25, .1954, and 
-1990 respectively, for the same four criteria. 
The multiple correlation coefficients were also 
much the same, being .603, .618, .592, and 
.594 respectively, for the four criteria. 
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For predicting the total honor point ratio 
for those earning a degree, the high school 
percentile rank and the Pharmacy Mathe- 
matics Test II were found to have standard 
partial regression coefficients which were sig- 
nificant on the 1 per cent level, but the beta 
for the Johnson Science Survey Test III- 
Total, although relatively not small, was not 
significantly different from zero as the num- 
ber of cases was small for this criterion 
(N = 47). For predicting the first quarter 
first year honor point ratio for those earning 
a degree, all three variables had standard 
partial regression coefficients which were sig- 
nificant on either the 1 per cent or the 5 per 
cent level. The high school percentile rank 
and the Pharmacy Mathematics Test II were 
found to have betas that were significant on 
the 5 per cent level for predicting the two 
State Board Examination criteria, but the 
betas for the Johnson Science Survey Test 
III-Total were not significant. 


This same three-variable combination was 
also indicated to be of value for predicting 
the first quarter second year honor point 
ratio, the full second year honor point ratio, 
and the second year honor point ratio. The 
standard partial regression coefficients for the 
high school percentile rank were .4195, .3717, 
and .3998 respectively, for these three cri- 
teria, all three betas being significant on the 
1 per cent level; the betas for the Pharmacy 
Mathematics Test II were .2959, .2205, and 
.2090 respectively, for the three criteria, the 
first beta being significant on the 1 per cent 
level and the last two on the 5 per cent level; 
the betas for the Johnson Science Survey Test 
III-Total were .1450, .1890, and .1598 re- 
spectively, for the three criteria, the second 
of the three betas being significant with a 
probability equal to .o5 and the other two 
being large enough to approach rather closely 
the 5 per cent level of significance. The mul- 
tiple correlation coefficient for this three- 
variable combination was .615 with the first 
quarter second year honor point ratio, .543 
with the full second year honor point ratio, 
and .546 with the second year honor point 
ratio. 

On the basis of the correlation and regres- 
sion analyses and the characteristics of the 
various criteria, three of the regression equa- 
tions developed were chosen to be considered 
further as prediction formulas for estimating 
the future success of students applying for 
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TABLE III 


SUMMARY OF MULTIPLE CORRELATION AND REGRESSION DATA FOR EACH OF THE VARIOUS CRITERIA 
oF SUCCESS, WITH A COMBINATION OF THREE INDEPENDENT VARIABLES FOR STUDENTS 
ENTERING THE COLLEGE OF PHARMACY AS FRESHMEN IN THE FALL QUARTERS 
OF 1938, 1939, 1940, AND 1941 


Criteria of Success 
First Quarter First Year Honor ow Ratio 


R = .608 N = 155 is 
F = 28.71 df = 3/151 “s 
P = <.01 ; 
Full First Year Honor Point Ratio 
R = .618 N = 134 
F = 26.72 df = 3/130 
P = <.0! 
First Year Honor Point Ratio 
R = .592 N = 155 
F = 27.20 df = 3/151 
P= <.01 
Total Honor Point Ratio 
R = .594 N = 155 
F = 27.41 df = 3/151 
P= <.01 


P 
Total Honor Point Ratio (For Those : Degree) 


R = .692 
F = 13.19 
P —s 


N = 47 ’ 
6 
df = 3/43 B 
t 


<.01 
P 


admission to or actually enrolling in the Col- 
lege of Pharmacy as freshmen. The best com- 
bination of predictive variables was indicated 
to be the three-variable combination of the 
Pharmacy Mathematics Test II, the Johnson 
Science Survey Test III-Total, and the high 
school percentile rank; the first formula 
chosen was that for predicting the first quar- 
ter first year honor point ratio from this 
three-variable combination, the second that 
for predicting the first year honor point ratio 
(based on one, two, or three quarters of 
work) from the same three variables, and the 
third that for predicting the first quarter 


Johnson High 

Mathematics Survey Test III Percentile 
Test II 0 nk 
. 3575 
. 0668 
5. 35 
<.01 
. 4108 
. 0710 
5.79 
<.01 
. 3965 
. 0675 
5. 88 
<.01 
. 4130 
. 0674 
6.13 
<.01 
. 4336 
. 1126 
3. 85 


. 3249 
. 0681 
4.77 
<.01 
. 2656 
. 0723 
3. 67 
<.01 
. 2600 
, 0688 
3.78 
<.01 
. 2362 
. 0687 
3.44 
<.01 
. 2269 
. 1246 
1. 82 


. 1916 
- 0697 
2.75 
<.01 
. 2125 
. 0740 
2. 87 
<.01 
. 1954 
. 0703 
2.78 
<.01 
. 1990 


2. 83 
<.01 
. 3625 
. 1224 
2. 96 


<.01 >.05 <.01 
(Table III Continued Next Page) 


second year honor point ratio from the same 
three variables. 

Underlying the statistical methods of cor- 
relation and regression used in this study are 
the assumptions that the regressions are 
linear in form and that the variances in the 
different arrays are equal; therefore, the val- 
idity of these assumptions was determined for 
each of the relationships involved in the 
selected prediction formulas. There was prac- 
tically no evidence to indicate that these 
assumptions of linearity of regression and 
homoscedasticity were not valid; all of the 
tests of homoscedasticity indicated that the 
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TABLE III—Continued 


Criteria of Success 


First Quarter First Year Honor oe Ratio (For Those or 


R = .689 
F = 12.92 
P= <.0l 


N = 47 ; 


df = 3/43 %p 
t 


Johnson 

Pharmacy Scie 

Mathematics Surve 
Test II 


nce 
Test III 

otal 

. 3606 

. 1280 
2.19 


<.05>.01 


2. 88 


P 
Score on State Board ee ee Practical | _ 


N = 45 


R = .662 : 
df = 3/41 op 
t 


F = 5.98 
P= <.0l 


P 
Score on State Board Senn Practical = 


R = .591 N = 37 f 
F= 5.91 df = 3/83 “6 
P= <.01 : 
First Quarter Second Year Honor Polat Ratio 
R = .615 N = 105 
F = 20.44 df = 3/101 6 
P= <.01 
Full Second Year Honor Point Ratio 
R = .543 N = 86 
F = 11.40 df = 3/82 
P= <.01 
Second Year Honor Point Ratio 
R = .546 ‘N = 106 
F = 14.29 df = 3/101 
P= <.01 


variances within arrays were equal, and only 
one of the tests of linearity, that for the 
regression of the Johnson Science Survey Test 
III-Total on the Pharmacy Mathematics Test 
II, indicated a tendency toward non-linearity, 
the probability being between .o5 and .o1 that 
the departure from linear regression was sig- 
nificant. 

The purpose in developing prediction 
formulas was to use them for guidance pur- 
poses for new groups of entering students; 
therefore, it was necessary to determine 


. 1448 
2.20 
<.056>.01 
972 
. 1561 
2. 54 
<.05>.01 
. 2959 
. 0830 
3. 56 
<.01 >.05 
. 2205 . 1890 
. 0981 . 0948 
2.25 1.99 
<.05>.01 . 05 
. 2099 . 1598 
- 0882 . 0853 
2.38 1. 87 
<.05>.01 >.05 


2. 60 
<. 05> .01 
. 4195 
. 0814 
5.15 
<.01 
. 3717 
- 0962 
3. 86 
<.01 
. 3998 
. 0865 
4.62 
<.01 


- 1450 
. 0803 


whether the prediction formulas developed on 
an experimental sample could be used validly 
for other new groups of entering students, 
that is, to determine whether the different 
groups of entering students could be consid- 
ered as random samples from the same popu- 
lation. During the period of this investigation, 
twelve different groups of students had 
entered the College of Pharmacy as freshmen; 
therefore, they were used to test the assump- 
tion that the different entering groups of 
freshmen were not significantly different from 
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one another. Data on the high school percen- 
tile rank were available for all twelve of these 
groups, on the Johnson Science Survey Test 
IITI-Total for ten of the twelve groups, on the 
Pharmacy Mathematics Test II for ten of the 
twelve groups, on the first quarter first year 
honor point ratio for seven of the groups, on 
the first year honor point ratio for seven of 
the groups, and on the first quarter second 
year honor point ratio for six of the groups. 
The analysis of variance technique was used 
to test the homogeneity of the means of the 
different groups on the different predictive 
variables and the different criteria, and the 
Welch—Nayer technique was used to test the 
homogeneity of variances. The tests of homo- 
geneity on the three predictive variables and 
the three criteria, based on six to twelve 
entering groups of freshmen, indicated that 
the different entering groups could be consid- 
ered as random samples from the same popu- 
lation. 


It would appear that the findings from the 
experimental sample, which included four of 
the twelve entering groups, could be used 
validly for the other entering groups consid- 
ered here. In fact, it would appear that a 
sufficient number of different entering groups 
were considered, particularly on the predic- 
tive variables, to warrant the conclusion that 
the findings from the experimental sample 
could be used validly for guidance purposes 
for other similar entering groups of freshmen 
students not included in this study of entering 
freshmen. 

The final step in this freshman study was 
to translate the findings of the study into 
form for practical use by changing the stand- 
ard partial regression equations over into pre- 
diction formulas in raw score form and 
setting up and illustrating methods for inter- 
preting, for guidance purposes, the results 
obtained by using the prediction formulas. 
To illustrate these techniques, only one of the 
three regression equations which were chosen 
for practical application will be discussed 
here. 

The first prediction formula previously 
chosen to be used was that for predicting the 
first quarter first year honor point ratio (Y,) 
from the multiple combination of three inde- 
pendent variables, the Pharmacy Mathe- 
matics Test II (x,), the Johnson Science 
Survey Test III-Total (x,), and the high 
school percentile rank (x,). The standard 


[Vol. 14, No. 3 


partial regression equation (relative deviate 
score form) for this three-variable multiple 
relationship, based on 155 freshmen who 
entered the College of Pharmacy in the fall 
quarters of 1938, 1939, 1940, and 1941, was 
as follows: 


Y,’ = .19162,’ + .32492%,’ + .35752;’. 


The partial regression equation (raw score 
form) or prediction formula corresponding to 
the above standard partial regression equa- 
tion was as follows: 

Y, = .04484x, + .03412%, + .o1r80x, — 
1.2598. 


The formula for the standard error of a 
specific predicted value (a mean value) deter- 
mined for an individual from the above par- 
tial regression equation or prediction formula 
was as follows: 

*Y (xt) = 
V .002937 [1 + .090454(*, — 10.3935)? + 


017415 (%, — 27.1548)? + .001655(x, — 
60.8258)? — .022577(x, — 10.3935) (*, 














27.1548) — .005220(x, — 10.3935) (25 





60.8258) — .000384(x, — 27.1548) (x, 
60.8258) ]. 


Given an applicant’s Pharmacy Mathematics 
Test II score (x,), Johnson Science Survey 
Test III-Total score (x,), and high school 
percentile rank (x,), the most likely honor 
point ratio (Y,), which he may be expected 
to earn for the first quarter first year of 
course work in the College of Pharmacy may 
he predicted from the above partial regres- 
sion equation or prediction formula. This 
estimate is the most likely value in the sense 
that it is the expected average or mean for 
all individuals with a given x,, x,, and x,. The 
standard error of the most likely first quarter 
first year honor point ratio for the individual, 
may be estimated from the formula for the 
standard error of a specific predicted mean 
value by using the same three values of the 
independent variables as were used in the 
prediction formula for obtaining the most 
likely honor point ratio. The standard error 
of the specific predicted mean value, that is 
the most likely honor point ratio, provides a 
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measure of the reliability of the specific pre- 
dicted mean value from the standpoint of 
errors of random sampling. Three examples 
are given to illustrate the use of these two 
formulas. 


Student A entered the College of Pharmacy 
as a freshman in the fall of 1942, therefore 
was not included in the experimental sample 
on which the prediction formula and the 
standard error formula were based. Student 
A had a score of 12 on the Pharmacy Mathe- 
matics Test II, a score of 22 on the Johnson 
Science Survey Test III-Total, and a high 
school percentile rank of 60. Substituting 
these three values in the partial regression 
equation, or prediction formula, a value of 
0.74 was obtained as the most likely first 
quarter first year honor point ratio for Stu- 
dent A. Substituting the same three values in 
the formula for the standard error, a value 
of .o74 on the honor point ratio scale was 
obtained for the standard error of the most 
likely honor point ratio (0.74). Using the 
standard error (.074) to determine the reli- 
ability of the most likely honor point ratio 
(0.74) for Student A, one may say that the 
interval on the honor point ratio scale ex- 
tending from 0.59 to 0.89 will include or 
cover the ¢rwe mean first quarter first year 
honor point ratio of all individuals in the 
population whose scores are 12, 22, and 60 
respectively, on the Pharmacy Mathematics 
Test II, the Johnson Science Survey Test 
III-Total, and the high school percentile 
rank, and that one may be confident that this 
statement is correct 95 times out of 100. 
These limits were determined by multiplying 
the standard error (.074) by 1.976, which is 
the value of “¢” at the 5 per cent level for 
151 degrees of freedom, and adding and sub- 
tracting the resulting value (0.15) to the pre- 
dicted mean value (0.74); that is, the limits 
are equal to 0.74 + (1.976) (.074). 

- Student B also entered the College of 
Pharmacy as a freshman in the fall quarter 
of 1942; Student B had a score of 16 on the 
Pharmacy Mathematics Test II, a score of 
39 on the Johnson Science Survey Test III- 
Total, and a high school percentile rank of 
96. Substituting these three values in the par- 
tial regression equation for predicting the 
first quarter first year honor point ratio, a 
value of 1.92 was obtained as the most likely 
first quarter first year honor point ratio for 
Student B. Substituting the same three values 
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in the formula for the standard error of a 
specific predicted mean value, a value of .129 
on the honor point ratio scale was obtained 
for the standard error of the most likely 
honor point ratio (1.92). It should be noted 
that the sfandard error (.129) for the pre- 
dicted mean value of 1.92 for Student B is 
considerably larger than the standard error 
(.074) for the predicted mean value of 0.74 
for Student A; this difference occurs because 
the predicted mean value of 1.92 is near the 
extreme and the predicted mean value of 0.74 
is near the center of the distribution, and the 
sampling error is larger at the extremes than 
near the center of the distribution. Using the 
standard error (.129) to determine the reli- 
ability of the most likely value (1.92) for 
Student B, one may say that the interval on 
the honor point ratio scale, 1.92 + (1.976) 
(.129), extending from 1.67 to 2.17 will in- 
clude or cover the ¢rue mean first quarter first 
year honor point ratio of all individuals in 
the population whose scores are 16, 39, and 
96 respectively, on the Pharmacy Mathe- 
matics Test II, the Johnson Science Survey 
Test III-Total, and the high school percentile 
rank, and that one may be confident that this 
statement is correct 95 times out of roo. 


Student C, who also entered the College of 
Pharmacy as a freshman in the fall quarter 
of 1942, had a score of 11 on the Pharmacy 
Mathematics Test II, a score of 13 on the 
Johnson Science Survey Test III-Total, and 
a high school percentile rank of 30. Substi- 
tuting these three values in the two equations, 
a value of 0.03 was obtained as the most 
likely first quarter first year honor point ratio 
for Student C, and a value of .135 on the 
honor point ratio scale was obtained for the 
standard error of the most likely value (0.03). 
It should be noted again that the standard 
error (.135) for the predicted mean value of 
0.03 for Student C is about the same as the 
standard error (.129) for the predicted mean 
value of 1.92 for Student B, but is considerably 
larger than the standard error (.074) for the 
predicted mean value of 0.74 for Student A. 
The predicted mean value of 1.92 is near one 
end of the distribution; the predicted mean 
value of 0.03 is near the other end of the dis- 
tribution, and the predicted mean value of 
0.74 is near the center of the distribution. 
Using the standard error (.135) to determine 
the reliability of the most likely first quarter 
first year honor point ratio (0.03) for Student 
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C, one may say that the interval, 0.03 + 
(1.976) (.135), extending from —o.24 to 0.30 
on the honor point ratio scale will include or 
cover the frwe mean first quarter first year 
honor point ratio of all individuals in the 
population whose scores on the Pharmacy 
Mathematics Test II, the Johnson Science 
Survey Test III-Total, and the high school 
percentile rank are 11, 13, and 30 respec- 
tively, and that one may be confident that 
this statement is correct 95 times out of 100. 


Another method of interpreting the most 
likely first quarter first year honor point ratio 
is by means of the “standard error of esti- 
mate” or standard deviation of the residual 
variation. The “standard error of estimate” 
for predicting the first quarter first year honor 
point ratio from the above partial regression 
equation was .673 on the honor point ratio 
scale. Using Student A, who had a most likely 
first quarter first year honor point ratio of 
0.74 to illustrate this method, one may say 
that the interval, 0.74 + (1.976) (.673), ex- 
tending from —o.59 to 2.07 on the honor 
point ratio scale will, on the average, include 
or cover the actual earned first quarter first 
year honor point ratio of all individuals in 
the population whose scores are 12, 22, and 
60 respectively, on the Pharmacy Mathe- 
matics Test II, the Johnson Science Survey 
Test III-Total, and the high school percentile 
rank, and that one may be confident that 
this statement is correct 95 times out of 100. 
Student A actually did earn a first quarter 
first year honor point ratio of 1.33, which is 
within the limits set up. It should be pointed 
out that the “standard error of estimate” is 
an average value and does not hold for any 
one specific predicted mean value (most 
likely value) except by chance, but the stand- 
ard error of the specific predicted mean value 
is a specific value for the particular predicted 
mean value for the individual. 

In order to facilitate the use of the findings 
of this freshman study, tables of probabilities 
were prepared, which would assist the ad- 
visers of students in interpreting an individ- 
ual’s most likely honor point ratio as pre- 
dicted from one or the other of the three 
prediction formulas developed. The probabil- 
ity tables were determined on the experi- 
mental sample, that is, on the freshmen who 
entered in the fall quarters of 1938, 1939, 
1940, and 1941, and on whom the prediction 
formulas were developed. Briefly, each prob- 
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ability table was based on the joint distribu- 
tion of the most likely honor poifit ratios 
predicted from the partial regression equation 
or the prediction formula and the actual 
earned honor point ratios, for the students in 
the experimental sample. Three probability 
tables were prepared, one for use in inter- 
preting the most likely honor point ratios 
obtained from the prediction formula for pre- 
dicting the first quarter first year honor point 
ratio, one for interpreting the most lkely 
honor point ratios obtained from the predic- 
tion formula for predicting the first year 
honor point ratio, and one for interpreting 
the most likely honor point ratios obtained 
from the prediction formula for predicting 
the first quarter second year honor point 
ratio. The values in these tables represent the 
maximum probability because the percentages 
were based on frequencies that not only in- 
cluded those who had a predicted honor point 
ratio below a specified range and an earned 
honor point ratio equal to or above a par- 
ticular level, but also neglected those who had 
a predicted honor point ratio in the specified 
range but had an earned honor point ratio 
below that particular honor point ratio level. 
The method used in developing these prob- 
ability tables is that which was reported by 
Freeman and Johnson® in a study of the pre- 
diction of success in the College of Agricul- 
ture, Forestry, and Home Economics at the 
University of Minnesota. 

The utility of these probability tables will 
be illustrated by using one of them, Table IV, 
to interpret the predicted first quarter first 
year honor point ratios for two students who 
entered the College of Pharmacy as freshmen 
in the fall quarter of 1942. Student D had a 
score of 8 on the Pharmacy Mathematics 
Test II, a score of 26 on the Johnson Science 
Survey Test III-Total, and a high school per- 
centile rank of 55. Substituting these three 
values in the regression equation given above, 
a predicted honor point ratio of 0.63 (Y,) 
was obtained as the most Jikely first quarter 
first year honor point ratio for Student D. 
Entering Table IV with a Y, equal to 0.63, 
one finds that there are 36 chances in 100 of 
Student D’s making an honor point ratio 
equal to or above —o.50, 32 chances in 100 
of his eons o8 an honor point ratio equal to 


* Edward M. and Palmer O. jam. “Prediction 
of Success in ine be College 7 Fag Ms ay 
Economics,” 


Scholastic a, Part One ta ads University 
of Minnesota Press, 1942), pp. 61-65. 
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TaBLe IV 
PROBABILITY TABLE GIVING THE CHANCES IN 100 THAT A STUDENT ENTERING THE COLLEGE OF 


PHARMACY AS A FRESHMAN WITH A PARTICULAR PREDICTED, FIRST 
Year Honor Point Ratio WILL EARN A FIRst 
Honor Point RATIO EQUAL TO OR ABOVE 

SPECIFIED 


First Quarter First Year Honor Point Ratio 
Chan 


Predicted First Quarter 
First Year 
Honor Point Ratio (Y1) 


or above 0.00 (D), 15 chances in 100 of his 
making an honor point ratio equal to or above 
1.00 (C), and 10 chances in 100 of his mak- 
ing an honor point ratio equal to or above 
2.00 (B) during the first quarter of his first 
year in the College of Pharmacy; Student D 
actually made an honor point ratio of 0.93 
during the first quarter of his first year. 
Student E had a score of 13 on the Pharmacy 
Mathematics Test II, a score of 29 on the 
Johnson Science Survey Test III-Total, and 
a high school percentile rank of 64. Using 
these three values in the regression equation, 
a predicted honor point ratio (Y,) of 1.07 
was obtained as the most likely first quarter 
first year honor point ratio for Student E. 
Entering Table IV with Y, equal to 1.07, one 
finds that there are 78 chances in 100 of Stu- 
dent E’s making an honor point ratio equal 
to or above —o.50, 75 chances in 100 of his 
making an honor point ratio equal to or above 
0.00 (D), 58 chances in 100 of his making 
an honor point ratio equal to or above 1.00 
(C), and 40 chances in 100 of his making an 
honor point ratio equal to or above 2.00 (B) 
during the first quarter of his first year in the 
College of Pharmacy. Student E actually 
made a first quarter first year honor point 
ratio of 1.12. 

Information such as has been illustrated in 
the past few paragraphs might be furnished 
the student advisers, either in the Admissions 
Office, the University Student Counseling 
Bureau, or the College of Pharmacy, to aid 
them in the guidance of students applying for 
admission to or actually enrolled in the Col- 
lege of Pharmacy as freshmen. 

The Sophomore Study—This study is 
similar to the freshman study except that it 


QUARTER FIRST 
UARTER First YEAR 
IFFERENT 


ces in 100 of Earning an Honor Point 
Ratio Equal to or Above 
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100 


was made on students who entered the Col- 
lege of Pharmacy with advanced standing as 
sophomores instead of on students who 
entered as freshmen. The first part of the 
study was concerned with the determination 
of the relative value of the different variables 
for predicting certain criteria of success, and 
was based on students who entered the Col- 
lege of Pharmacy as sophomores from the 
fall quarter of 1938 through the fall quarter 
of 1941. The second part of the study was 
concerned with the practical application of 
the findings of the first part, and was based 
on students who entered the College of Phar- 
macy as sophomores from the fall quarter of 
1933 through the summer quarter of 1942. 

The predictive variables used in this study 
were the same as those used in the study of 
entering freshmen, except that the pre- 
professional honor point ratio, instead of the 
high school percentile rank, was used as a 
measure of previous scholastic success. The 
criteria were different for this group in that 
“first year” stands for the freshman year for 
students who entered the College of Phar- 
macy as freshmen, but “first year” stands for 
the sophomore year for students who entered 
the College of Pharmacy as sophomores. That 
is, “first year” stands for the first year in the 
College of Pharmacy, regardless of whether a 
student entered as a freshman or as a soph- 
omore. 

There were 107 students who entered the 
College of Pharmacy as sophomores in the fall 
quarters from 1938 through 1941, who had 
complete data on the predictive variables, and 
who completed at least the first quarter of 
the sophomore year. Twenty-six of these stu- 
dents entered in the fall of 1938, 24 in the 
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fall of 1939, 38 in the fall of 1940, and 19 in 
the fall of 1941. The tests of homogeneity of 
means and standard deviations for the four 
groups on the different predictive variables 
and criteria indicated that the four groups 
were not significantly different from each 
other, and could be considered as random 
samples from the same population; all except 
three of the 42 tests of significance yielded a 
probability greater than .o5, the three excep- 
tions yielding probabilities between .o5 
and .or. 


The correlation and regression analyses 
indicated that the pre-professional honor 
point ratio; the Iowa Chemistry Aptitude 
Test, Part 2—paragraph 1; and the Iowa 
Chemistry Aptitude Test, Part 3 were, in gen- 
eral, more valuable than were the other vari- 
ables for predicting the success of this group 
of entering sophomores. The results of these 
analyses are presented briefly in the following 
paragraphs. 

The zero-order correlation coefficients be- 
tween the different predictive variables and 
the various selected criteria revealed that 
when the variables were considered as indi- 
vidual predictors, the pre-professional honor 
point ratio was the only one to be of value. 
The correlation coefficients between the pre- 
professional honor point ratio and the differ- 
ent honor point ratio criteria were all greater 
than .55, and those between the other pre- 
dictive variables and the criteria were usually 
below .35. When each of the predictive vari- 
ables was combined with the pre-professional 
honor point ratio in two-variable combina- 
tions for predicting the various criteria, the 
two-variable multiple correlation coefficients 
indicated that the Iowa Chemistry Aptitude 
Test, Part 2—paragraph 1 and the Iowa 
Chemistry Aptitude Test, Part 3 were more 
valuable in two-variable combinations with 
the pre-professional honor point ratio than 
were the other predictive variables. 

The standard partial regression coefficients 
indicated that the pre-professional honor 
point ratio; the Iowa Chemistry Aptitude 
Test, Part 2—paragraph 1; and the Iowa 
Chemistry Aptitude Test, Part 3 were more 
valuable than were the Pharmacy Mathe- 
matics Test II, the Johnson Science Survey 
Test III-A, and the Johnson Science Survey 
Test III-B when these six variables were com- 
bined for a six-variable multiple correlation 
and regression analysis for predicting the 
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different honor point ratio criteria. None of 
the six variables in the combination were 
indicated to be of value for predicting the 
State Board Examination criteria. 


Four different combinations of three inde- 
pendent variables were studied for predicting 
the various criteria of success for this group 
of students who entered as sophomores. The 
four different combinations were as follows: 
the three-variable combination of the pre- 
professional honor point ratio, the Johnson 
Science Survey Test III-Total, and Part 3 
of the Iowa test; the three-variable combina- 
tion of the pre-professional honor point ratio, 
the Pharmacy Problem Solving Test, and 
Part 3 of the Iowa test; the three-variable 
combination of the pre-professional honor 
point ratio, Part 2 of the Iowa test, and Part 
3 of the Iowa test; and the three-variable 
combination of the pre-professional honor 
point ratio, Part 2—paragraph 1 of the Iowa 
test, and Part 3 of the Iowa test. In the first 
combination and in the second combination, 
the pre-professional honor point ratio was the 
only one of the three variables that was indi- 
cated to be of value for predicting the dif- 
ferent criteria. When Part 2 of the Iowa test 
was used in the combination in place of one 
of the other two variables, it was found that 
Part 3 of the Iowa test as well as the pre- 
professional honor point ratio was of value, 
and although Part 2 of the Iowa test did not 
have standard partial regression coefficients 
that were significant, it apparently added to 
the value of Part 3 of the Iowa test. The last 
of the four combinations was indicated to be 
the most valuable of the three-variable com- 
binations considered. 


Considering the last of the four three- 
variable combinations, that for predicting the 
various criteria of success from the pre- 
professional honor point ratio; the Iowa 
Chemistry Aptitude Test, Part 3; and the 
Iowa Chemistry Aptitude Test, Part 2—par- 
agraph 1, it was found that all three of the 
variables in the combination had standard 
partial regression coefficients that were sig- 
nificant on either the 1 per cent or the 5 per 
cent level for predicting five of the six honor 
point ratio criteria. The pre-professional 
honor point ratio was the only variable that 
had a significant regression coefficient for pre- 
dicting the total honor point ratio, and none 
of the variables had significant regression 
coefficients for predicting the two State Board 
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Examination criteria. The standard partial 
regression coefficients for each of the three 
variables were very much the same for five 
of the six honor point ratio criteria. The betas 
for the pre-professional honor point ratio were 
5778, «5581, .5828, .5529, .5258, and .5974 
respectively, for the six honor point ratio 
criteria; the betas for the Iowa Chemistry 
Aptitude Test, Part 3 were .1884, .1772, 
1878, .1207 (P > .05), .2158, and .1974 re- 
spectively, for the six honor point ratio cri- 
teria; and the betas for the Iowa Chemistry 
Aptitude Test, Part 2—paragraph 1 were 
—.1928, —.1864, —.2006, —.1600(P > .o5), 
—.2150, and —.2149 respectively, for the six 
honor point ratio criteria. The multiple cor- 
relation coefficients for these three-variable 
combinations were .650, .626, .655, .594, .620, 
and .678 respectively, for each of the six 
honor point ratio criteria. 


Three of these regression equations were 
chosen for consideration for practical use for 
estimating the future success of students 
applying for admission to or actually enroll- 
ing in the College of Pharmacy as soph- 
omores, the choice being made on the basis 
of the correlation and regression data and the 
characteristics of the various criteria. The 
first equation which was chosen was that for 
predicting the first quarter first year honor 
point ratio from the three-variable combina- 
tion of the pre-professional honor point ratio; 
the Iowa Chemistry Aptitude Test, Part 3; 
and the Iowa Chemistry Aptitude Test, Part 
2—paragraph 1. The second equation was 
that for predicting the first year honor point 
ratio (based on one, two, or three quarters of 
work in the first or sophomore.year) from 
the same three-variable combination. The 
third equation was that for predicting the 
total honor point ratio for those earning a 
degree from the same three variables. 


As in the freshman study, the validity of 
the assumptions of linear regression and 
homoscedasticity was determined for each of 
the relationships involved in the regression 
equations selected to be used for practical 
application. The tests of significance provided 
practically no evidence to indicate that these 
assumptions were not valid for the relation- 
ships considered. All of the tests of homosce- 
dasticity indicated that the variances within 
arrays were equal for all of the relationships 
involved, and all except one of the tests of 
linearity indicated that the regressions were 
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linear in form; the only indication of non- 
linearity was for the regression of the Iowa 
Chemistry Aptitude Test, Part 3 on the Iowa 
Chemistry Aptitude Test, Part 2—paragraph 
1, the probability being less than .o1 that the 
variance due to departures from linear regres- 
sion was significant. 


Before these selected regression equations 
developed on the experimental sample could 
be used as prediction formulas for other new 
groups of entering sophomores, it was neces- 
sary to determine whether the different groups 
of students entering the College of Pharmacy 
as sophomores were random samples from the 
same population. During the period of this 
investigation, ten different groups of students 
had entered the College of Pharmacy as 
sophomores, and these groups were used to 
test the assumption that the different entering 
groups were not significantly different from 
each other. The tests of homogeneity were 
done on five predictive variables, the three 
that were involved in the selected regression 
equations, and in addition on the Johnson 
Science Survey Test III-Total and the Phar- 
macy Mathematics Test II, because for many 
of the entering groups data were not available 
on the Iowa Chemistry Aptitude Test. The 
tests of homogeneity were done on only two 
of the three criteria that were involved in the 
selected regression equations as data on the 
total honor point ratio for those earning a 
degree were not available except for the 
groups included in the experimental sample. 
Data on the pre-professional honor point 
ratio were available for all ten of the groups; 
on the Pharmacy Mathematics Test II for 
eight of the ten groups; on the Johnson Sci- 
ence Survey Test III-Total for eight of the 
ten groups; on the Iowa Chemistry Aptitude 
Test, Part 3 for seven of the ten groups; on 
the Iowa Chemistry Aptitude Test, Part 2— 
paragraph 1 for five of the groups; on the 
first quarter first year honor point ratio for 
five of the groups; and on the first year honor 
point ratio for five of the groups. 


These different entering groups were tested 
for homogeneity of the mean scores on the 
predictive variables and criteria by the anal- 
ysis of variance technique and for homo- 
geneity of variances by the Welch—Nayer 
technique. The tests of homogeneity, based on 
five to ten different entering groups of soph- 
omores, for five predictive variables and two 
criteria indicated that, although there was a 
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rather definite tendency for the 1935 entering 
group to have lower mean scores on the pre- 
dictive variables, the different groups were 
probably sufficiently homogeneous to justify 
the tentative conclusion that they were ran- 
dom samples from the same population. Data 
were not available on the 1935 entering group 
for the criteria nor on the Iowa Chemistry 
Aptitude Test, Part 2—paragraph 1; there- 
fore, no information was obtained concerning 
this group on these factors. 


It was indicated that the findings based on 
the experimental sample could be used validly 
for the other groups considered here, except 
possibly for the 1935 entering group; how- 
ever, a larger number of different entering 
groups would need to be considered before 
any very definite statement could be made 
concerning the validity of using the findings 
based on this experimental sample for pur- 
poses of guidance for groups of entering stu- 
dents other than those considered here. The 
evidence was probably sufficient to warrant 
the statement that the findings might be 
tentatively used for guidance purposes for 
other similar entering groups of sophomores 
not included in this study, until data were 
available on more entering groups, making 
possible the further investigation of this 
problem. 


The final step in this sophomore study was 
the conversion of the findings from the experi- 
mental sample into a form that could be 
readily used for practical guidance purposes. 
The same methods were used in this soph- 
omore study as were used in the freshman 
study. The three selected regression equations 
were converted to prediction formulas for use 
with raw scores and the method of using these 
formulas was illustrated. The use of the 
standard error of a specific predicted mean 
value and the “standard error of estimate” 
for interpreting most likely honor point ratios 
obtained from the prediction formulas was 
also illustrated. Probability tables were also 
determined which might be used to interpret 
the predicted or most likely honor point ratios 
obtained by using the prediction formulas. 


IMPLICATIONS 


This investigation was not prompted by 
any one specific problem in the College of 
Pharmacy. The College was interested not 
only in the problem of selective admissions, 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No. 3 


but also in efficient service to individuals 
through effective guidance of the students 
throughout the period of their registration in 
the College of Pharmacy. Furthermore, this 
interest was centered on students who were 
underachieving, overachieving, and achieving 
up to capacity, on all levels of ability. Tenta- 
tive obtained from part of the data 
included in this study were used for guidance 
purposes during the period of this investiga- 
tion and were indicated to be of value. The 
present study, based on a larger number of 
entering groups of students, in which the data 
were analyzed with more refined techniques, 
and in which special attention was given to 
the valid application of the findings, has pro- 
vided considerably more information and 
should be useful as one of the bases for intelli- 
gent and effective guidance of students. 


To assume that this investigation was com- 
plete or that the findings were conclusive and 
the predictions based on them were infallible 
would be unwarranted. The prediction for- 
mulas were developed on groups of students, 
but their value lies in their applicability to 
individuals in other entering groups of stu- 
dents. The prediction formula provides a pre- 
dicted value which is a most likely value 
(honor point ratio in this case) for an indi- 
vidual in the sense that it is the expected 
average or mean for all individuals in the 
population with a given set of predictive 
scores. Methods for interpreting predicted 
mean or most likely honor point ratios for 
individuals as determined and illustrated in 
this report, should be of value to advisers of 
students. Predictions based on the findings 
from this investigation, used critically by 
advisers, should be an important aid in guid- 
ing individual students. 

The next few paragraphs will state briefly 
some of the various purposes for which pre- 
dictions for individuals may be used. In the 
first place, students applying for admission to 
the College of Pharmacy, who on the basis 
of their previous scholastic record and general 
college aptitude appear to be potentially poor 
students, might be given the test battery to 
aid in determining whether or not they should 
be admitted to the College of Pharmacy. If 
the predictions were low and there were on 
the basis of the probability tables, for ex- 
ample, only about 15 chances in 100 of a 
student’s earning an honor point ratio of 1.00 
(C) or better during the quarter of his 
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first year, and only about 10 chances in 100 
of his earning an honor point ratio of 1.00 or 
better during his first year in the College, 
then he probably should not be admitted to 
the College of Pharmacy, but assisted in find- 
ing some other field of study or activity for 
which he was better suited. If the predictions 
were such that there were, on the basis of the 
probability tables, about 35 chances in 100 
of a student’s earning an honor point ratio 
of 1.00 or better during the first quarter of 
his first year, and about 33 chances in 100 of 
his earning an honor point ratio of 1.00 or 
better during his first year in the College, he 
might be admitted on probation. 


Using predictions as one of the factors in 
determining the admission of students of 
doubtful ability is, however, only one of the 
ways in which they may be used. Most of the 
students take the test battery after they are 
admitted to the College of Pharmacy, and 
predictions probably have their greatest value 
in the use that might be made of them by the 
advisers and the administrative officer of the 
College. To facilitate the discussion, three 
groups of students will be considered, namely, 
those who, on the basis of predictions, have 
been indicated to be potentially poor, average, 
and superior students. 


Individuals who have been identified as 
potentially poor students might make very 
poor records, reasonably acceptable records, 
or average records, but there is only a slight 
possibility of their making superior records. 
If a low first quarter prediction for an indi- 
vidual is confirmed by a low record for the 
first quarter, and if that individual was also 
predicted to make a low record for the first 
year, he should be advised to withdraw from 
the College. Whether he should be advised to 
transfer to some other college of the Univer- 
sity or withdraw from the University would 
depend on his general level of ability, apti- 
tudes, and interests; he should be referred to 
the Student Counseling Bureau, or at least he 
should not be dropped from the College with- 
out some attempt being made to help him find 
how he can best use his potentialities. A stu- 
dent who has been predicted to be a low 
achiever for the first quarter of his first year 
and who makes only a reasonably satisfactory 
record during his first quarter should also be 
considered as in need of guidance. He should 
be encouraged for the record which he has 
achieved, but if he was predicted to make a 
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poor record during his first year and also 
during the first quarter of his second year, he 
should be advised to apply himself diligently 
to his school work, and consider carefully the 
amount of time he gives to outside work and 
extracurricular activities. His interests and 
abilities in other fields of study and activities 
should also be considered in the event he does 
not continue to maintain or improve upon the 
reasonably acceptable record which he made 
during his first quarter. 


Students who have been identified as poten- 
tially average achievers should not be neg- 
lected when considering those in need of 
guidance. Some of them might make poor 
records during the first quarter of their first 
year, others fairly satisfactory or acceptable 
records, and still others good or superior rec- 
ords. Students who were predicted to be aver- 
age students and who underachieve in the 
College of Pharmacy should be cautioned the 
same as students who have been predicted to 
be poor students and who actually over- 
achieve; consideration should be given their 
study habits, their interest in pharmacy, the 
amount of their outside work, the amount of 
participation in extracurricular activities, 
their living conditions, their financial prob- 
lems, and other factors related to their adjust- 
ment to college work. Predictions for the first 
year and the first quarter of the second year 
should also be considered in advising such 
students. Some of these students might be 
requested to withdraw or transfer; others 
might be put on probation, some of them 
withdrawing subsequently and others making 
the adjustment and achieving an acceptable 
record. Individuals in this group who achieve 


- up to their predictions or overachieve should 


be encouraged and watched; some of them 
might become poor achievers or superior 
achievers, as the predictions are not infallible. 
Such students stand in need of guidance, even 
though their first quarter or first year records 
were satisfactory. 


The value of predictions for individuals 
who have been indicated as good or superior 
students should not be overlooked. In fact, 
some of the most significant uses of prediction 
formulas are those related to these students. 
Individuals who are potentially good or supe- 
rior students often underachieve, and would 
appear to be only good or average students. 
Predictions should be of value in identifying 
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these students soon after they enter the Col- 
lege, so that they can be given advice and 
guidance, and be helped to realize their 
potentialities. Considerable effort would be 
justified in helping such individuals to analyze 
and solve their problems. Many of the diffi- 
culties these students are encountering could 
probably be adjusted early in their scholastic 
careers, so that they could go on to make 
outstanding records in the College of Phar- 
macy. Some of these students might be found, 
however, to have a more vital interest in some 
related field of study, possibly chemistry or 
medicine; the possibility of a transfer should 
not be overlooked if the student is to achieve 
in accordance with his ability and become a 
well-adjusted individual. The importance of 
early identification of potentially superior 
students can not be overemphasized. These 
students, both those who underachieve and 
achieve up to expectatiqns, should be encour- 
aged in their scholastic work, and their in- 
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terest in the profession of pharmacy devel- 
oped. They should be informed of the various 
educational and professional possibilities in 
the field of pharmacy beyond the usual one 
of becoming a practicing registered pharma- 
cist. Some of these students might be advised 
to take the optional combined five-year course 
in Pharmacy and Business Administration, 
which is open only to students in the College 
of Pharmacy who are of better than average 
ability. Some of these students should become 
candidates for various honors, scholarships, 
teaching assistantships, research assistant- 
ships, and graduate study. Since it is particu- 
larly important that such students obtain a 
good professional and educational foundation, 
they should be given serious consideration and 
guidance from the time they enroll in the 
College. Only by such a definite policy will 
they be stimulated and enabled to use their 
capacities to the full and become an asset to 
the College, the University, and society. 
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UNDER SCHOOL CONDITIONS 
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INTRODUCTION 

The purpose of this paper is to explain as 
simply as possible the appropriate techniques 
to use in conducting an experiment under 
school conditions. The first few pages deal 
with procedures which may be employed 
where an experiment is conducted in a single 
school. While the discussion may at times 
appear formidable, the actual handling of the 
data of such an experiment is not particularly 
difficult. The last few pages deal with the 
handling of the data for an experiment con- 
ducted in several schools. The procedure is 
more complex, but should not trouble anyone 
who has mastered the techniques for a single 
school and who follows the directions care- 
fully. 

Before considering the procedures to be 
used in conducting an experiment in a single 
school, it should be emphasized that careful 
definition of the compared methods of instruc- 
tion, adequate control of non-experimental 
factors, and valid measurement of achieve- 
ment, are of as great, if not greater, impor- 
tance than the statistical treatment of the 
data. The fact that most of the discussion has 
to do with the latter should not seem to mini- 
mize the importance of the former. 


CONDUCTING AN EXPERIMENT WITHIN 
A SCHOOL 


Let us suppose that the purpose of the 
experiment is to compare two or three differ- 
ent methods of instruction. The first step in 
conducting such an experiment is careful defi- 
nition of the compared methods. If the 
methods are described in detail, the descrip- 
tions serve as guides to instruction during the 
experiment. Such descriptions are also useful 
in preparing a report of the experiment so 
that others may know precisely what is being 
compared. 

* Stimulating criticism and a number of valuable 
have been received from Prof. Karl J. Holzinger of Ua 
versity of Chicago, Prof. Fi O. johnson of the University 
¢ Minnesota, Prof. E. Lindquist of the State University 
of Iowa, Dean Walter S. "lesen of the University of —— 
Mr. Ledyard Tucker of the College Entrance 
Beast and Mr. Joseph J. Urbancek of the Chicago — 


It should be planned to conduct the experi- 
ment over a long enough period of time that 
appreciable differences in achievement can 
occur. Tests should be constructed or selected 
which will serve as valid and reliable measures 
of achievement, both at the start and at the 
close of the experiment. If the initial and final 
tests are equivalent in difficulty, gains in 
achievement may be measured. If the initial 
and final tests are not equivalent, comparison 
is made on the basis of the scores of the final 
test, but it is always desirable to give an 
initial test. The initial scores may be used as 
a means of equating the classes participating 
in the experiment, or they may be used in 
making a statistical allowance for an initial 
lack of equivalence of the classes. 

The matching or equating of pupils may 
not be feasible since it usually involves the 
reassignment of pupils to classes. A major 
purpose of this paper is to show how it is 
possible to conduct an experiment with intact 
school classes and to make the statistical 
allowance referred to above. When this tech- 
nique is used, it is usually assumed that the 
pupils in the classes have been assigned to 
them in a purely random manner, If the ex- 
periment is begun very soon after the opening 
of the school year, and there is no ability 
grouping, the use of classes as they are may 
not be a serious violation of this assumption. 
The assignment of the pupils to the classes 
may have been essentially random. This 
would be the case where no special efforts are 
made to assign particular pupils to particular 
teachers. If, however, the experiment is 
planned during the semester preceding the 
one in which it is to be conducted, then it 
should be possible to assign the pupils to the 
participating classes purely at random. An 
excellent procedure to be used in making such 
a random assignment is described in the text 
by Lindquist (9) on pages 24-29 (see the 
bibliography). 

The decision to have one class taught by 
Method A, another by Method B, and an- 
other by Method C, should not be made in 
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terms of the preferences or skill of the teach- 
ers. It is to be hoped that the teachers do not 
differ greatly in skill and that the assignment 
of the compared methods to the teachers is 
made at random. If the experiment is con- 
ducted by one teacher in two classes, the 
teacher should strive to teach with equal skill 
and zeal in both. 

The initial test may be an intelligence test. 
It may be an achievement test in the field of 
the experimental instruction. In order to 
make the best possible allowance for initial 
lack of equivalence of the groups (or to make 
the best kind of matching of classes) the 
initial test should correlate highly with the 
final test. Very often, high correlation is 
secured by using an achievement test as the 
initial test which is similar in character to the 
final test. The initial test need not be equiv- 
alent in difficulty to the final test. It may be 
better to use a much easier test at the start 
and a harder test at the end. The initial test 
may be over subject matter covered by the 
pupils before the experiment is begun. The 
final test may be over the subject matter 
taught during the experiment. Both the initial 
and final tests should be long enough to be 
reliable. The scoring should be objective, or 
relatively objective. The testing conditions 
should be equally good in the different classes. 
Matching of pupils, or statistical allowance 
for lack of equivalence of classes, can be 
made on the basis of more than one test, but 
it is more difficult to do so and the increase 
in precision of the experiment is often not 
great enough to warrant the additional labor. 
Until one has gained considerable experience 
in handling experimental data, it is probably 
better to use a single initial test and to have 
that test an achievement test similar in char- 
acter to the final test as has been explained 
above. 

After the initial test has been given in each 
of the classes participating in the experiment, 
the teacher or teachers should follow the 
methods as defined for each of the classes. 
In an experimental comparison of methods of 
instruction, the same subject matter or mate- 
rials of instruction should be used in each of 
the classes. If a comparison is being made of 
curriculum materials rather than methods, 
the methods should be the same. On the other 
hand, if the comparison involves a combina- 
tion of methods and materials both of which 
differ from class to class, the nature of the 
combination should be described in the ex- 
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perimental plan and should ultimately be 
explained in any report made of the experi- 
ment. As the experiment progresses, changes 
may occur in the methods or materials used. 
If these vary from class to class it is espe- 
cially important that the nature of the vari- 
ation be ultimately reported. Efforts should 
be made to conduct the experimental instruc- 
tion in a way which does not have the aspects 
of a campaign. In other words, the pupils 
should not be motivated by the knowledge 
that they are participating in an experiment. 
There should be no competition between the 
various classes. Conditions should be com- 
parable to those characteristic of instruction 
under usual school conditions. 


HANDLING AND INTERPRETING THE DatTA 
OBTAINED From MATCHED GROUPS 
oR CLASSES 


In discussing the handling and the inter- 
pretation of experimental data, consideration 
will be given first to the case in which the 
pupils in two or more classes have been 
paired on the basis of their scores on a single 
initial test. If there are only two such classes 
only one comparison need be made, that be- 
tween Methods A and B. If there are three 
classes, three comparisons can be made, for 
example, between Method A and Method B, 
between Method A and Method C, and be- 
tween Method B and Method C. It will suffice 
to illustrate the comparison between two 
methods only. There will be two sets of initial 
scores and two sets of final scores. Let us 
refer to the initial scores as X, in the class 
which was taught by Method A and X, in 
the class which was taught by Method B. Let 
us refer to the final scores respectively as 
Y, and Y,. (If gains are measured by equiv- 
alent tests, the individual gains may be re- 
ferred to as Y, and Y,.) Prepare two tables, 
one for each class, and list in the columns 
headed X, and FY, of the first table and X, 
and Y, of the second table, the initial and 
final scores of the pupils. (Both scores of a 
given pupil are on the same line. Table I 
illustrates this, but lists only a few of the 
scores.) 

Add the X, scores and divide by the num- 
ber of pupils in the class to obtain Mx,, the 
mean of the X, scores. Then subtract this 
mean from each of the X, scores to obtain 
the scores labeled x, and list these scores in 
the second column. (Approximately half of 
these x, scores will be negative.) List the 
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TABLE I 
FicTiT10ous DATA For ONE OF Two EQUIVALENT CLASSES 
Cuiass I 


xi? 
16 
121 
361 
16 
0 


169 
4 


Totals. 2X,=1,344 2x1" =2,342 


1344 
M, —1244 — 42 
“1 $2 


— of the x, scores in the column headed 
; (All of the squares will be positive.) Add 

the x, column to obtain the total to be desig- 
nated =x,*. In the same way obtain My, and 
Sy,*. Multiply each x, score by the y, score 
of the same pupil and list in the column 
headed x,y,. Total this column to obtain 
3x,y,. (This procedure is illustrated in the 
table by hypothetical data. Using Mx, = 42 
and My, == 51 the reader may check some of 
the entries.) Repeat this procedure in the case 
of the table for the other class to obtain Mx,, 
My,, 3x,", Xy,", and &x,j2. (Because of pair- 
ing, Mx, should equal Mx, and =x,? should 
equal 3x,*. If approximately equal values of 
Mx, and Mx, are not obtained, the procedure 
described in the next section should be used.) 

ix* = Sz,? + iz,’ 

sy? = Sy,? + Sy,” 

xy = =4,y, + 22292 

The three values thus obtained may next 

be inserted in the following formula to secure 
the correlation between the initial and final 
scores of all of the pupils. 


_—sBaxy 
V 3x"? Sy’ 


The coefficient of correlation thus obtained, 
the means of the final scores of the two 
classes My, and My,, the Sy* used in the 
above formula, and the numbers of pupils in 
each class N, and N, are then substituted in 
the following formula from which “?” is com- 
puted. 


ZY: =1,632 


Yi 
68 
57 
58 
41 
45 


64 
54 


2 y;?=3,116 2 x1y1 = 1,842 


1632 
M:,= ——_— 51 


The numerator is the difference between 
the final means. (If My, is greater than My, 
the results favor the second group or class 
rather than the first.) The denominator is the 
measure of experimental error used for the 
case of matched groups. For an experiment 
involving two classes of 32 pupils each, or 
61 degrees of freedom, ¢ must exceed 2.00 for 
the results to be statistically significant at the 
“five per cent level” and must exceed 2.66 to 
be significant at the “one per cent level.” 

In interpretation we first formulate the 
“null” hypothesis that no real difference ex- 
ists. Then, if the ratio of the difference to the 
error of the difference or ¢ equals or exceeds 
2.66 we can infer that only, or less than, one 
per cent of the time, in repeating the experi- 
ment with similar groups, would we get a 
difference as great or greater than the ob- 
served difference as a result of the operation 
of chance in the selection of pupils and as a 
result of the operation of chance errors of 
measurement. /f important non-experimental 
factors such as zeal or skill of the teachers 
have been controlled, that is, have operated 
equally in both groups and the measurements 
of achievement are valid, we may conclude 
that the difference can with reasonable assur- 


1The denominator of the formula for ¢ is essentially the 
Lindquist-Wilks formula for the standard - 
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ance be attributed to the difference in merit 
of the two methods. We may generalize for 
similar classes of pupils in this school that 
Method A is better than Method B. If ¢ = 
2.00 or is between 2.00 and 2.66 a similar 
interpretation is made, but with less assur- 
ance. If ¢ is exactly 2.00 the chances of 
obtaining a difference as great or greater than 
the observed difference through the operation 
of chance are 1 in 20. While these odds are 
not great enough to claim statistical signifi- 
cance, if both methods are equally feasible so 
far as effort or cost are concerned, it is sen- 
sible to use the method yielding the greater 
average achievement. 

The basis of the type of interpretation de- 
scribed in the preceding paragraph may be 
more clear if we suppose that the experiment 
is repeated so many times using samples of 
pupils drawn from the same general popula- 
tion that one obtains a normal distribution 
of differences.? If chance alone operates in 
creating the differences, the average difference 
is zero and there are as many negative differ- 
ences as positive ones. The denominator of 
the expression used in calculating ¢, the 
standard error of the difference obtained in 
the actual experiment, is an estimate of the 
standard deviation of this normal distribution 
of differences. We may represent this distri- 
bution graphically as follows: 


Use > 
, 


| ip 
, Samat 2.66 Oy = q 


Fig. 1. Curve of Distribution of Differences. 








@aitt. Tepresents the standard error of the 
observed difference, or the standard deviation 
of the hypothetical distribution of differences. 
The shaded area represents one per cent of 
the total area of the curve. 

Where chance alone is the cause of the 
differences only one difference in one hundred 

* The ot Selected should have the same average and 
variability initial scores, al the final achievement 


means and, hence, the final achievement differences will vary. 
See footnote 7. 
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is large enough to extend from zero into the 
shaded area at either end of the curve. Hence, 
when ¢ is equal to or greater than 2.66 and 
the degrees of freedom is 60, we infer with 
confidence based on chances of 99 in 100 
that the observed difference is due to some 
cause other than chance. If non-experimental 
factors have been controlled we ascribe the 
difference to the variation in methods of 
instruction. 

The procedure described above also applies 
to groups which have been matched merely 
with respect to the means and standard devi- 
ations of the initial scores. Pairing is one way 
of securing equal means and standard devia- 
tions. Another way is to eliminate cases from 
one group which cause the mean and stand- 
ard deviation of that group to differ from 
those of the other group. In this case, the 
groups need not be of equal size. Where 
pupils have been paired, the standard error 
of the difference between the final means may 
also be obtained by calculating the standard 
deviation of the individual differences between 
the final scores of the paired pupils and 
dividing this standard deviation by one less 
than the number of pairs, or one can use the 
“long” formula for the standard error of a 
difference which involves the correlation be- 
tween the final scores of the paired pupils. 
Both of these procedures will give the same 
standard error,* and the same result will also 
be obtained using the matched groups 
formula, if the standard deviations (or stand- 
ard errors of the means) of the final scores 
of both groups are equal, if the correlations 
betwéen initial and final scores of both 
groups are equal, and the product of these 
correlations equals the correlation between 
the final scores of the paired pupils. The 
last mentioned condition occurs, if, as should 
usually be the case, all of the correla- 
tion between the final scores is due to what 
the pupils in each pair had in common and 
was measured at the start of the experiment 
by their equal scores on the initial test. For 
further information about these matters see 
the articles by Engelhart (1), McNemar (11), 
Shen (14), and Thorndike (16) referred to 
in the bibliography. Thorndike states that the 
matched groups formula should be used where 
groups are matched as wholes and the long 


*The squared standard errors of the means used in the 
tong fevanin el mead tp be eubineed extimates of the pape. 
lation variance. In other words, in comput! these standard 
errors, the standard deviations of the distributions of final 


scores are divided by VN — 1 rather than by VN. 
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formula where pupils are paired. The matched 
groups formula can be used in either case, 
and seems preferable since one obtains, in the 
correlation between initial and final scores, an 
indication of the efficiency of the matching 
criterion. McNemar suggests the. substitution 
in the long formula of the square of the cor- 
relation between initial and final scores. in 
place of the correlation between final scores. 
The article by Thorndike should definitely be 
read by experimenters whose groups are 
chosen from different populations, for ex- 
ample, students who have had or who have 
not had Latin in an experiment on methods 
of teaching a modern language. 

Another reason for describing a procedure 
here which involves the use of the matched 
groups formula is that the description con- 
tributes to understanding of the covariance 
technique used where groups are not equiv- 
alent at the start of the experiment. The 
matched groups formula and the covariance 
technique give the same results for two 
groups which are equivalent. 


Tue TESTING OF FUNDAMENTAL 
ASSUMPTIONS 


Before using the matched groups formula, 
the careful experimenter will test the assump- 
tion of homogeneity of variance, i.e., that the 
variability of the final scores of the compared 
groups does not differ from class to class more 
than could be expected as a result of chance. 
A simple means of making this test is to 
divide Sy,? by one less than the number of 
pupils in the class and do the same for Sy,”. 
Divide the larger of the values thus obtained 
by the smaller and test the significance of this 
ratio by consulting a “Table for F” which 
may be found in the texts by Goulden (4), 
pages 269-272, Snedecor (15), pages 222- 
225, and Lindquist (9), pages 62—65. The 
degrees of freedom for both variances are the 
same, one less than the number of pupils in 
each class. Snedecor states that the tabular 
probabilities should be doubled, i.e., values 
given for the one per cent level, in this in- 
stance, are values for the two per cent 
level. If this ratio is significant, it may be 
possible to secure homogeneity of variance 
by transforming the data by means of some- 
what elaborate mathematical procedures. 
However, if the ¢ test is highly significant, 
and irrelevant factors are adequately con- 
trolled, for practical purposes, the experi- 
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menter is usually justified in concluding that 
there is a real difference in the final means. 
A significant F ratio may also indicate that 
the methods differ in their effect on the vari- 
ability of the final achievement scores. 


The same test of homogeneity of variance 
may also be applied before using the proce- 
dure described in the following paragraphs for 
groups not equivalent at the start of the 
experiment. (Sy,?, Sy,”, Sy,” etc., should be 
calculated as explained with reference to 
Table I, or as explained in footnote 4. Where 
there are more than two or three classes, or 
the experiment is conducted in_ several 
schools, a more precise test of the homo- 
geneity of variance may be used. Information 
with respect to a test devised by Bartlett can 
be found in the text by Rider (12), pages 
102-103, in the text by Lindquist, pages 
99-100 and 125-126, and in the text by 
Snedecor, pages 249-252. Another test of the 
homogeneity of variance is the L, test devised 
by Pearson and Neyman and modified by 
Welch. This test and also Bartlett’s test are 
described and illustrated in an article by 
Tsao (18). The reader interested in methods 
of conducting an experiment in several schools 
should also study the article by Godard and 
Lindquist (3). 

Another assumption is that the correlations 
between initial and final scores are not sig- 
nificantly different from group to group, or 
expressed in another way, the regressions of 
final on initial scores are fundamentally the 
same from group to group. This assumption 
may be tested as follows: For each group or 

2 
class calculate =y*? — oy) and sum the 
values thus obtained. Let us call this sum 
“4.” Next obtain the sums of Sx,?, =x,%, etc., 
Xy,?, Sy,?, etc., and Sx,y,, =x2y., etc. Substi- 
tute these sums in the formula just given to 
obtain a value which we will call “B.” Sub- 
tract A from B to obtain “C.” Divide C by 
its degrees of freedom, one less than the num- 
ber of classes, to obtain the variance, V,. 
Divide A by its degrees of freedom, the total 
number of pupils minus twice the number of 
classes, to obtain the variance, V,. Compute 


=T and consult a table of F values to 


determine whether or not the ratio is signifi- 
cant, using the degrees of freedom specified. 
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If the ratio is not significant, the assumption 
of homogeneity of regression is satisfied.‘ 

The same assumptions should be tested 
before applying the procedures described in 
the following paragraphs. Other assumptions 
include linearity of regression and normality 
of the distributions of “adjusted” final scores. 
These assumptions are seldom tested in edu- 
cational research, although appropriate sta- 
tistical techniques are described in any good 
text in statistical methods. 


HANDLING AND INTERPRETING THE DATA 
WHERE THE Groups or CLASSES ARE 
Not INrtTIAtty EQutvALENT 


In order to lay a basis for understanding 
the statistical treatment of data in the case 
where the groups or classes are not.initially 
equivalent, let us suppose that numerous 
samples of pupils have been drawn at random 
from the same general population. Two esti- 
mates can be made of the variability, or 
spread, of the general population. These esti- 
mates could be in terms of the standard devi- 
ation, but it is customary to use the square 
of the standard deviation, or “variance” 
rather than the standard deviation itself. One 
such squared standard deviation can be 
obtained by calculating the squared standard 
deviations of all of the samples and averaging 
them. This estimate of the population vari- 
ability is called the “within groups variance.” 
The other estimate is obtained by tallying 
the distribution of means of all the samples, 
calculating the squared standard deviation of 
this distribution, and multiplying it by N the 

“If V, is equal to or less than V, there is no need to con- 


sult the table. The ratio is not significant. B is equal to 
Z¥(1 —#,,) of the matched groups formula im the case of 


te of variation of the 
scores which are independent of the initial 
regression line. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No. 3 


number of pupils in each sample.' This esti- 
mate is called the “between groups variance.” 
If the samples have been drawn at random, 
these two estimates of the population vari- 
ability will be equal. If some factors other 
than chance have operated the estimates will 
not be equal. Usually, the “between groups 
variance” will be greater than the “within 
groups variance.” The ratio between the two 
variances is called “F” and its interpretation 
is similar to that of “t.” (If there are only 
two initially equivalent groups and the appro- 
priate formulas are used ¢* is equal to F and 
the chances of significance are the same 
although looked up in different statistical 
tables.) If the groups are nof initially equiv- 
alent the ratio to compute is F. The means of 
the final scores or the mean gains will need 
to be adjusted for the initial lack of equiv- 
alence. The variance between groups and the 
variance within groups will also need io be 
adjusted in terms of the relation between the 
initial and final scores. The method of doing 
this is comparatively simple and can best be 
explained by showing the calculations that 
would be performed on actual data. 


Let us assume that there are 30 pupils in 
each group or class. (In order to use the pro- 
cedure described, the classes should be of 
equal size.* If they are not, discard the data 
of a few pupils in one of the classes selecting 
such pupils at random.) Let us suppose that 
the data for both (or all) classes have been 
entered in a table similar to Table II and that 
the indicated sums of scores, sums of squares 
of scores, and sums of products of scores have 
been obtained. (Only a few lines of the table 
are given. The test scores and totals will 
usually be larger than those given. X, and X, 
stand for initial scores and Y, and Y, for 
final scores.) 

The assumptions of homogeneity of vari- 
ance and of homogeneity of regression should 


© The standard error of a mean is the standard deviation of 
such a hypothetical distribution of means. It is usually cal- 
culated from the standard deviation of a sample by use of 
the familiar formula: 
¢ 


vv 
If we knew the standard deviation of the population rather 
than the standard deviation of the sample we could get a more 
accurate value of cy, by use of the same formula. On the 
other hand, if we calculate cy directly from the distribution 
of means, just as if the means were ordinary scores, then 
No*,, should equal o*pop, the squared standard deviation of 
. (Square both sides of the above equation and 
fractions.) 


xy 


the 
clear 

*Tsao (17) has contributed a method of bending experi- 
mental data where the classes or sub-groups are varying 





—_— 


— FF £ 


[SS FF Ba & Ge 


March, 1946] 


EXPERIMENTATION UNDER SCHOOL CONDITIONS 


TABLE II 
Fictitious DATA For Two NON-EQUIVALENT CLASSES 


Class 1 (Method A) 
xX? Yi Y? 
861 15 225 

10 100 
324 


18,896 11,765 
ZY 2X1Y1 ZY3 


be tested by methods described or referred to 
in the preceding section. The totals of the 
columns of Table II may be substituted in 
the equations mentioned in the last sentence 
of footnote 4. N, is the number of pupils in 
each class. 


Data for both (or all) classes: 
2X = 808 SY? = 37,517 
=Y = 1,457 =XY = 21,468 
2X? = 13,777 N = 60 


If there were more than two groups or 
classes the data would be similarly treated 
and combined. 


In order to obtain the adjusted variance 
between groups and the adjusted variance 
within groups both of which are analogous to 
squared standard deviations we need first to 
convert the sums of squares of scores and 
sum of products to sums of squares and a sum 
of products in which the scores are actually 
in terms of deviations from the means of the 
scores. (The sum of the squares of scores 
expressed as deviations from a mean divided 
by the total number of scores is the squared 
standard deviation of scores, or a variance. 
The difference here is that the division is by 
degrees of freedom rather than N and adijust- 
ment is made for the relation between initial 
and final scores.) The sums of squares and 
products expressed as deviations are obtained 
for “total” or all pupils, for between groups, 
and for within groups. These sums of squares 
and products are classified under the heads 
ix*, Sy*, and Sxy, using small letters. We 
first calculate the following correction terms: 


X1 Y1 Xs 


Class 2 (Method B) 
X?* Ys Y2* 


285 14 196 16 256 
140 225 15 225 
288 13 169 


221 17 289 
300 256 15 225 
483 12 144 


723 18,621 

ZY2? 2 X2Y2 
808? 
jo = 

(SY)? 

y= 


2X ZY 808. 1457 
-. Te 


10,881.07 


For Y 





= 19,620.93 





For XY 


Sz", Sy", and Ixy for total 
(2X)? 
— 
is? == 2X W 
13,777 — 10,881.07 = 2,895.93 


— xp: (2F)* 
ty = sF¥? y= 
37,517 — 35,380.81 = 2,136.19 


21,468 — 19,620.93 = 1,847.07 


iz’, ty’, and xy for between groups 
__ (3X,)? + (2X,)? (3X)? 
ix? = V —S-= 





1 
2 2 
ete 10,881.07 == 405.59 


sy¢ — 2%) aoe ee. 


2 2 
134 F723) 35,380.81 == 2.02 
30 
__ 3X, 3Y,+ 2X,2¥, 2X3 
=a N, or 
482 - 734 + 326 - 723 
30 











19,620.93 == 28.60 
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It should be evident to the reader how the 
above formulas can be. extended to the case 
of more than two groups or classes. In the 
case.of three groups (=X,)*, (S¥,)?, and 
3X, SY, would be added to the appropriate 
numerators. NV, is the number of pupils in 
each group. 
3x", Sy", and Ixy for within groups 

These values are obtained by subtracting 
the between groups values from the values for 
total. 

Sx? = 2,895.93 — 405.59 == 2,490.34 

Sy* = 2,136.19 — 2.02 == 2,134.17 

sxy = 1,847.07 — 28.60 = 1,818.47 


The sum of squares of the final scores for 
total is next adjusted in terms of the relation- 
ship between the initial and final scores. A 
similar adjustment is made for the sum of 
squares for within groups.” 


* The reason making this adjustment and also for using 
in the matched groups formula may 


in the formula given above 


of squares also refer 


is not related to, or is independent of, the variation in the 
initial scores. 


for a single r of groups, . 
exist for the hypothesined numerous 


hardly expect the achieve- 
high. In other words, the variability in 
because of the lack of vari- 


The above discussion applies also 
ment test is used in matching. It also applies to 
the start of an 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No. 3 


These adjustments are as follows: 


(xy)? (1,847.07)* 
sy¥* — Se 
” =x" 2,895.93 
== 958.10 Adjusted sum of squares for total 


2 2 
ay? — Ge 2,134.17 — eeeat 
Adjusted sum of 
== 806.31 squares for within 
groups 


The adjusted sum of squares for between 
groups is obtained by subtracting the ad- 
justed sum of squares for within groups from 
the adjusted sum of squares for total.® It is 
called by some authorities a reduced sum of 
squares. 

958.10 — 806.31 == 151.79 Reduced sum of 
squares for between 


groups 


The reduced sum of squares for between 
groups is divided by one less than the num- 
ber of groups to obtain the reduced variance 
between groups. In this case the division is 
by one. The adjusted sum of squares for 
within groups is divided by three less than 
the total number of pupils in order to obtain 
the adjusted within groups variance 14.15. 
(One less for each group and one for the cor- 
relation. If there were three groups, the de- 
grees of freedom for within groups would be 
four less than the total number of pupils.) 
The F ratio is the ratio between the reduced 
between groups variance and the adjusted 
within groups variance. Most of the above is 
summarized in Table ITT. 

As in the case of the equated groups, we 
first hypothesize that the true difference be- 
tween the groups is zero. On consulting a 
table of values of F* for various degrees of 
freedom, we find for the given numbers of 
degrees, 1 and 57, that F must equal or ex- 
ceed 4.01 to be significant at the five per cent 
level and must equal or exceed 7.10 to be 
significant at the one per cent level. Hence, 
we infer that the probability is less than one 
in a hundred that the difference between 
these groups is due to the operation of chance. 
If non-experimental factors have been ade- 
quately controlled, we may ascribe the dif- 
ference in achievement, as measured by the 
be todited {a handling ‘certeln types of efwcational data. 


® Such tables are given in the texts by Goulden (4), Lind- 
quist (9), and Snedecor (15) referred to in the bibliography. 


= 2,136.19 — 
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TABLE III 
ANALYSIS OF COVARIANCE AND TEST OF SIGNIFICANCE OF ADJUSTED MEANS OF FINAL SCORES 


Sums of Squares and Products 


zx? Zxry 


28. 60 
1,818.47 


1,847.07 


Between Groups. - . - 405. 59 
Within Groups_ 2,490. 34 
2,895. 93 


P= 


2,184.17 
2,136.19 


151.79 _ 10.73 


Adjusted or Degreesof Adjusted or 
Reduced Freedom Reduced 
zy Sum of Variance 

Squares 
2.02 151.79 
806. 31 


958. 10 


151.79 
14.15 


14,15 


final test and adjusted as illustrated below, 
to the difference in methods of instruction. 

The unadjusted initial and final means of 
the two groups are as follows: 


The general mean of both groups on the 
— =X 808 
initial test My = 7 | ™ 13-47. The 
initial mean of the first group is 2.60 above 
the general mean while the initial mean of the 
second group is 2.60 below the general mean. 
lf the initial and final tests were perfectly 
correlated and scores numerically equal on 
either test were actually equal we could sub- 
tract 2.60 from the final mean of the first 
group and add 2.60 to the final mean of the 
second group to correct for the initial lack of 
equivalence of the groups. Since this condi- 
tion does not exist, the value 2.60 must be 
multiplied by the regression coefficient 5 
which takes into account both the actual lack 
of perfect correlation and any difference in 
value of the scores on the two tests. 


1818.47 7 


The values of Sxy and 3x* are those com- 
puted for “within groups.” Multiplying 2.60 
by .73 we obtain 1.90. 

24.47 — 1.90 == 22.57 Adjusted final mean 
of the first group 

24.10 + 1.90 == 26.00 Adjusted final mean 
of the second group 


The difference in adjusted means 3.43 
favors the second and initially inferior group. 
We have already established the fact that this 
difference is statistically significant. 

If the initial and final tests had been equiv- 
alent forms and gains were computed and 
treated in the same way as the final test 
scores, exactly the same value would have 
been obtained for F and the difference in 
adjusted average gains would exactly equal 
the difference in adjusted final means. 


HANDLING THE DATA WHERE MorRE THAN 
Two Crasses ARE USED IN THE 
SAME SCHOOL 


If more than two groups or classes are’used 
some modifications are necessary in handling 
the data. In obtaining =X, 2X*, SY, =Y?, 
and SXY, the appropriate sums for each of 
the groups are added. In computing =x’, Sy’, 
and Sxy for “between groups” more terms 
such as (3X,)*, (=Y,)*, and =X, =Y, are 
added to the formulas. The number of de- 
grees of freedom for “between groups” is one 
less than the number of groups or classes. 
The number of degrees of freedom for 
“within groups” will be NW — m — 1 where 
N is the total number of pupils and n is 
the number of groups. For example, had ~ 
there been 5 groups the degrees of free- 
dom would have been 4 and 54 respectively. 
In the case of more than two groups a sig- 
nificant F shows that the differences between 
the groups are, with a high degree of prob- 
ability, to be attributed to factors other than - 
chance, possibly to the variations in methods 
of instruction compared. Adjustment of the 
final means (or average gains) are made in 
much the same way. However, since the dif- 
ferences in initial means from the initial 
general mean will not be equal and opposite 
in sign as in the case of two groups, it will 
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be necessary to make the corrections as fol- 
lows retaining the signs of these differences. 
For example: 

Initial mean — (5)(-+ difference) = 
justed final mean 

Initial mean — (5)(— difference) — ad- 
justed final mean 


The adjustment for the two groups could 
have been written in this way: 


24.47 — (.73)(+ 2.60) = 22.57 
24.10 — (.73) (— 2.60) == 26.00 


The differences in adjusted fina] means, or 
adjusted average gains, may be compared and 
these differences tested for significance by 
means of the ¢ test. Some of the differences 
may be significant while others are not. For 
example, let us suppose that three methods 
A, B, and C have been compared. Then there 
are three differences—one between A and B, 
one between A and C, and one between B 
and C. It may be found that A and B are of 
approximately equal merit while C is of much 
greater or much less merit. The following 
formula may be used in computing the de- 
nominator of the ¢ ratio. (The numerator is 
the difference in the adjusted means being 
compared.) The denominator is the standard 
error of the particular difference. 


cane =4/ [2 4 hn =a F 


In this formula » is the number of pupils 
in each group or class, Mx, and Mx, are the 
initial means of the compared classes, =z* is 
the initial sum of squares “within groups,” 
and s* is the adjusted variance for within 
groups. Let us apply this formula to the case 
of the two groups for the purpose of illus- 

tration. 


ers 
a [5+ 


= 1.05 











as 27 
(16.07 — 10.87) ]s35 
2490.34 





_ difference _ 3.43 —e 
Faitt 1.05 





The value of ¢ exceeds the value of 2.66 
necessary for the difference to be regarded as 
statistically significant as explained in earlier 

. In the case of the comparison of 
three different differences between adjusted 
means, or adjusted average gains, three ?’s 
would be computed and compared. 
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A simpler but less exact method of obtain- 
ing the error of the difference is to divide the 
adjusted within groups variance by the num- 
ber of pupils in each class, take the square 
root, and multiply by 1.414. The two meth- 
ods will give approximately the same results 
if the initial test means of the groups or 
classes are not too far apart.” 

In footnote 7 it was explained that the 
process of changing the %y* for within groups 
and for total to adjusted sums of squares is 
equivalent to the use of the expression 
(1 —#*,,) in the formula for equated groups. 
There are, however, two different coefficients 
of correlation—one for total and one for 
within groups. (The deviate scores of the 
former are measured from the general means 
of the initial and final scores while in the case 
of the latter the deviates are measured from 
the group means.) These coefficients may be 
calculated by means of the following formula 
using the =x*, Sy*, and =xy for total in one 
case and for within groups in the other. 


3%) 
Vix sy 

In the case of our illustrative data 7,, for 
within groups is: 


vxy 


r&, 1818.47 
V 2490.34 - 2134.17 


aan). 2,134.17 is multiplied by (1 — 

8?) the result is 806.29 and would exactly 

= with the value 806.31 previously ob- 

tained had more decimal places been carried. 

As stated above y* for total can be adjusted 

in the same way using the correlation for 
total. 


HANDLING THE DATA WHERE MoRE THAN 
One In1T1AL Test 1s USED 


The method of adjusting the variances in 
terms of the coefficients of cor- 
relation is particularly useful when more than 
one initial test is used. If, for example, there 
are two initial tests and one final test they 
may be referred to as X,, X,, and Y. (The 
X, and X, scores of the first class may be 
symbolized by X,, and X,,, while X, and X, 
may here refer to the scores of all the pupils.) 
After obtaining =X,, =X,, 2Y, =X,*, =X,’, 
=Y?, =X,Y, 2X,Y, and =X,X, for all of the 
classes in combination, =x,*, =z,’, Sy’, =x,7, 
x,y, and %x,x, are obtained for total, be- 

% See Lindquist (9), page 195. 








_——" 
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tween groups, and within groups in the same 
way as described for one initial and one final 
test, the formulas are the same, only the 
symbols differ, and more calculation is in- 
volved. (2,7 is calculated in the same way 
as Xx* and 3x,y, Sx,y, and 3x,x, are calcu- 
lated in the same way as =xy was originally.) 
iy’ for total and %y* for within groups may 
be adjusted by multiplying by the factor 
(1 — R*) where R stands for a coefficient of 
multiple correlation of which there are two— 
one for total and one for within groups. Six 
ordinary coefficients of correlation must first 
be calculated: rz,y, fx2y, and fx,x, for within 
groups and Taxys Trays and f;,:, for total using 
the appropriate sums of squares and products. 
The correlation formula given on page 234 
may be used varying the symbols. For ex- 


ample, 

—— 

V 22,7 2’ 

Then using the appropriate values of 1;,,, 


Troy, and fz,x, the two R’s are obtained 
through use of the formula: 


Tze7 = 





anit Pay + ae = = Tx2y Txyx2 
Pasxe 


The reduced sum of squares for between 
groups is obtained as before by subtracting 
the adjusted sum of squares for within groups 
from that for total. The degrees of freedom 
for within groups will need to be reduced 1 
for each additional initial test. If two initial 
tests had been used in our illustration the de- 
grees of freedom would be 56. The expression 
for degrees of freedom can be written 
N — n — i, where N is the total number of 
pupils, m is the number of groups or classes, 
and ¢ is the number of initial tests. The de- 
grees of freedom for between groups would 
remain as before 1 less than the number of 
groups. 

In adjusting the final means two 6’s will 
be needed, one for each initial test. These 
may be calculated by using the formulas: 


ma 3x,? Sx,? — (32,22)? 
b __ 2*2Y =x,’ — =2,y =x, 2, 
“ Sx,* 2x,’ — (32,22)? 
In adjusting the final mean of each group, 


or class, first find the difference between the 
mean of the X, scores for that group and the 
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general mean of the X, scores and multiply 
this difference by bx,. Then find the differ- 
ence between the mean of the X, scores for 
that group and the general mean of the X, 
scores and multiply this difference by 5z,. 
Subtract both of these quantities from the 
mean of the final scores of the group. (As 
explained in the first paragraph of the pre- 
ceding section.) 

The differences between the adjusted final 
means, or adjusted average gains can be 
tested by means of the ¢ test using a modifi- 
cation of the formula given in the preceding 
section." 

Where there are more than two initial tests 
the experimenter can follow the general pro- 
cedure just described obtaining sums of 
squares for each series of scores and as many 
sums of products of scores as there are ways 
of pairing the scores of the individual pupils 
on each of the tests. The ordinary coefficients 
of correlation may be computed by the 
formula given, but the two multiple coeffi- 
cients of correlation and the regression coeffi- 
cients, the 5’s, will need to be calculated 
according to one of the procedures to be 
found in any good advanced text in statistical 
method. The inexperienced person will find 
the method outlined in the article by Griffin 
(5) very easy to follow. It should be pointed 
out, however, that the use of more than one 
initial test may not add much to the precision 
of the experiment. It may seem logical to use 
both an intelligence and an initial achieve- 
ment test in statistically equating groups, but 
each multiple correlation coefficient may not 
be much higher than the greater of the ordi- 
nary, or first order, coefficients and, hence, 
the adjustment may not be much greater than 
when only one initial test is used. 


Hanpiinc Data For AN EXPERIMENT 
CONDUCTED IN SEVERAL SCHOOLS 


We will assume that in each of several 
schools, selected at random, all of the com- 
pared methods are taught. (If the schools are 
not selected at random, generalizations will 
need to be restricted to similar schools.) We 
will assume also that the classés in all of the 
schools are of equal size and have been 
assigned at random to the methods. We will 
also assume that the pupils have been 
assigned at random to the classes, or that the 


See Goulden (4), 253. The standard error is the 
square root of the form gives. 
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TABLE IV 
TABLE For RecorDING Data For SEVERAL SCHOOLS 


School 1 
Method 
A B 
x F = 4 x y 4 P 4 


Method 
Cc 


use of intact classes does not significantly 
violate this assumption.** 

The test data may be recorded in a table 
similar to Table IV. 

The table will have as many sections as 
there are schools. Each section will have as 
many columns headed “Method” as there are 
methods. If more than one initial test is used 
label them X,, X, etc. and list under each 
method along with a column for the final 
test Y. 

After the data have been recorded in a 
table similar to the one illustrated above, 
consideration should be given to testing the 
assumptions of homogeneity of variance and 
homogeneity of regression. 

The procedure to be described in the fol- 
lowing paragraphs introduces a new term, the 
methods X schools variance. This variance is 
due in part to the same sources of error that 
produce the within classes variance. If it is 
larger, the increase may be due to real differ- 
ences in the comparative effectiveness of the 
methods from school to school. The order of 
effectiveness may vary or, if the rank is the 
same, the differences in effectiveness may be 
much more pronounced in some schools than 
in others. On the other hand, the increase 
may be due to irrelevant factors whose effects 
are not completely equalized from class to 
class within each school, or counterbalanced 
when the data from the different schools are 
combined. If, for example, the teachers in a 
majority of schools are more skillful in their 
use of Method A than in their use of Method 
B, while in other schools the teachers are 
more skillful in their use of Method B than 
Method A, this irrelevant factor may result 
in a significant methods X schools variance. 

The methods X schools variance will not, 
however, account for any uncontrolled irrel- 
evant factors which operate similarly, or in 
the same direction, in all of the schools. For 

196-203 of his text, Lindquist describes 


h may be used if the classes are of the como ne 
school, yh A, yO, - 5 


Method 
A 


School 2 
Method Method 
B Cc 


x fj x Y 


example, if all of the teachers are more skill- 
ful in their use of Method A than Method B, 
this source of error will not be accounted for 
in the methods X schools variance. The 
methods variance will be increased and, if the 
significance of the methods variance is due to 
such irrelevant factors, the experimenter may 
erroneously conclude that there are real dif- 
ferences in the relative effectiveness of the 
methods. 


The separation of a variance for schools 
has the effect of eliminating school differences 
from the measure of error. The older tech- 
nique of pooling the data from several schools 
into one experimental group and one control 
group may have served to eliminate the 
effects of school differences from the methods 
means, but the measures of error traditionally 
used have been so inflated by these school 
differences that the results have usually not 
been statistically significant. For a more ex- 
tended discussion of these matters the reader 
should consult pages 104-132 of Lindquist’s 
text. 


If we let 
m == method 
$s == school 
¢ == pupils in a class 
%~, == number of methods or classes in a 
school 
, == number of schools 
nm, == number of pupils in a class 
N = tp 1, m, the total number of pupils 


the sums of squares and products and the 
degrees of freedom may be calculated by 
means of the following formulas: (The proce- 
dure is based upon that contributed by John- 
son and Tsao (7, 8).) 


Sums of Squares, 3x* 


Within classes a—b 
Methods X Schools 5 —c,— 
Schools &-— 
Methods Cm — 
Total a—d 


Ca + d 





om x 


eovecce 
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a is the sum of the squares of the X scores of all of the pupils. 


6 is obtained by summing the X scores of each class, squaring 
each of the sm such sums, 
by the number of pupils in each class, n,.** 


totaling these squares, and dividing 


¢, is obtained by summing the X scores of all the pupils in each 
school, squaring the s such sums, totaling these squares, and 


5 ae oF eS dividing by the number of pupils participating in the experi- 


a Me | 


ieny) 


hg Me 








orga ; 
sm 


N 


The sums of squares of the Y scores (and 
of any initial tests) may be obtained by 
means of the same formulas substituting Y 
for X or designating the various initial tests 
by X,, X,, etc. The sums of products may 
also be obtained by the same formulas. In 
the calculation of “a” replace X? by XY. 


In the calculation of “bd” replace arid by 
(2X) GY). Tn each class sum the x’ scores, 


on the Y scores, and multiply these sums. 
Repeat this for all of the classes, and divide 


ment in each school, #,, m:. 


Cm is obtained by summing the X scores of all the pupils taught 
by each method, squaring the m such sums, totaling these 
squares, and dividing by the number of pupils taught by each 
method, #, %. 


d is obtained by summing all of the X scores, squaring this sum 
and dividing by N, the total number of pupils. 


scores, then multiply these sums. Repeat for 
each school, total, and divide by the number 
of pupils in each school. In obtaining c,, sum 
all of the X scores for a method, then all of 
the Y scores for the same method and mul- 
tiply these sums. Repeat for each method, 
total, and divide by the number of pupils 
taught by each method. In obtaining d sum 
all of the X scores, then sum all of the Y 
scores, multiply these sums, and divide by NV 
the total number of pupils. The same proce- 
dure would be used for obtaining such sums 


TABLE V 
Fictitious DATA For SEVERAL SCHOOLS 


School 2 
B Cc 


~ 


i? ro cocom ne bd 
Daneman < 
=F ecannar 
bo co com co Pd 
omacwn 


Il 
> 


II 
x ad 


° 
nad 


the total of these — 


a: 
eke 
~~ E 


Pat 
Tie b 


Se 


u= 


School 3 


A B Cc A 
p § 
2 


N= 60 


of products as =x,y, x,y, etc. The accom- 
panying miniature problem illustrates these 
calculations. 


For X 
o=4 
4 
4 


2 a - 
2 _ 2? 
+3?+2 


3, 
i. 


oe 
— 
aa 
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(44+3+3+4+2)+ 
(3+4+2+2+5)%+..-+ 
(4+3+2+4+2)? 

5 

167 + 167 + 16? + 
+ 167 + 15% + 15° 








5 
= 589.80 


CC, = 


(44+34+3.--2+44+3)% + 
(s+3+2...2+4+2)* 





t5 
(48)? + (41)? + (52)? + (46)" __ 68060 





Cm == 


ere’ Rt gad 
(44+3+2+....+2+4+ 2) 


20 
or 
(16 + 14 + 15 + 16)? + 
(16 + 12 + 19 + 15)? + 
(16 + 15 + 18 + 15)? 


20 


__ (61)* + (62)? + (64)? 














3X1+.... 
X 4 == 602.00 


- - 

16 KX 18 + 15 X 16+ 15 X 15 
5 
= 605.40 





b= 


48 X 51 + 41 X §2 + 52 X 44+ 
46 X 49 
1§ 
== 608.13 





Cc. = 
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61 X 72 + 62 X 58 + 64 X 66 
Cn = 
20 
== 610.60 


187 X 196 
ad _— 610.87 


The dots in the above expressions represent 
omissions. The reader is urged to check these 
calculations as a means of familiarizing him- 
self with the procedure. The reader should 
also note the way in which many of the sums 
recur in later expressions. 

The degrees of freedom with which to 
divide the adjusted or reduced sums of 
squares are given by the following expres- 
sions. For each additional initial test subtract 
one more unit from the d.f. for within classes. 


Degrees of Freedom 


Within Classes N — (nm, — 1)("_— 1) — 
(m, — 1) — (%_—1) —2 


Methods X Schools (nm, — 1)(#_— 1) 
Schools (n, — 1) 
Methods (tq — 1) 


The degrees of freedom for the illustrative 
data are 47, 6, 3, and 2. 

The number of tests does not affect the d.f. 
for methods X schools, for schools, and for 
methods. 

The sums of squares and products may be 
listed in the first three columns of a table 
similar to Table VI. 


The calculation of the entries for the last 
three columns are explained in the following 
paragraphs: 

First find the adjusted sum of squares for 
within Classes through the use of the formula 
sy? — Ger using the sums of squares and 
products in the first line of the table. Then 
find the adjusted sum of squares for within 
classes + M X S by combining the sums of 
squares and products in the first two lines and 
substituting these new values df x’, =zxy, 
and 3y* in the formula. Subtract the adjusted 
sum of squares for within classes from the 
adjusted sum of squares for within classes 
+ M X S to obtain the reduced sum of 
squares for M xX S. Next calculate the re- 
duced variances for schools and for methods. 
The first step is to calculate the adjusted sum 
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TABLE VI 
TABLE For REPORTING ANALYSIS OF COVARIANCE For SEVERAL SCHOOLS 
Sums of Squares 


and Products 


xy 
Within Classes 
Methods X Schools-__-. 


of squares for within classes -+- schools by 
combining the sums of squares and products 
in the within classes and schools lines of the 
table and substituting in the formula. Sub- 
traction of the adjusted sum of squares for 
within classes yields the reduced sum of 
squares for schools. The reduced sum of 
squares for methods is obtained similarly. 
The adjusted within classes sum of squares 
and the reduced sums of squares for methods 
X schools, schools, and methods are then 
divided by their respective degrees of freedom 
to obtain the corresponding variances. 

If the reduced variance for methods is not 
significant when compared with the adjusted 
within classes variance, there is no evidence 
so far as the experimental data are concerned 
that the methods vary in effectiveness. 

If the reduced methods variance and the 
reduced methods X schools variance are both 
significant when comparison is made with the 
adjusted within classes variance,** in order 

% According to the procedure described by Lindquist (9), 
the M x S variance is used as the error variance when it is 


wEREEE 


in 


it exists, or with 


‘within’ when 
when no ‘within’ exists, in order to increase the precision.” 


Adjusted or Reduced 


> zy? df. Variance 


for the experimenter to conclude that the 
relative effectiveness of the methods is as 
shown by significant differences in the adjusted 
methods means** the methods X schools vari- 
ance will need to be attributable to irrelevant 
factors which were not completely equalized 
or counterbalanced when the data from the 
different schools are combined. If analyses are 
made of the data of the individual schools 
according to the procedures previously de- 
scribed for a single school, a few anomalous 
schools may be identified in which the rela- 
tive effectiveness of the methods markedly 
differs from that of most of the schools. Evi- 
dence, other than the test data, may be col- 
lectible which so explains the differing effec- 
tiveness of the methods in the anomalous 
schools in terms of dissimilar school condi- 
tions that the variation in effectiveness of the 
methods may safely be considered legitimate, 
or the supplementary data may reveal that 
irrelevant factors operated chiefly in the 
anomalous schools. The pooling of the data of 
the rest of the schools may now yield a non- 
significant methods X schools variance and a 
significant methods variance. Further treat- 
ment of the data as suggested in the next 
paragraph may lead to a justifiable general- 
ization that the relative effectiveness of the 
methods shown for these schools applies to 
similar schools, i.e., to schools mot charac- 
terized by the conditions discovered in the 
anomalous schools. 

If the reduced methods variance is signifi- 
cant when compared with the adjusted within 
classes variance, but the reduced methods X 
schools variance, though larger than the ad- 
justed within classes variance, is not signifi- 
cantly larger, the within classes and methods 
X schools sums of squares and products may 
be combined to obtain “residual” sums of 
squares and products. These new values of 
3x*, xy, and Sy* are then substituted in the 
formula to obtain an adjusted residual sum 


% As adjusted by a regression soins compat’ from Ge 
within classes Dxy and 22* and th a standard 
error computed from the adjusted within classes variance. 
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of squares. A new reduced sum of squares for 
schools and a new reduced sum of squares for 
methods are next obtained, first computing 
adjusted sums of squares for residual + 
schools and residual + methods. The ad- 
justed residual sum of squares is divided by 
the sum of the degrees of freedom for the 
original adjusted within classes variance and 
the reduced methods X schools variance. The 
new reduced variances for schools and for 
methods are obtained by dividing by the same 
degrees of freedom previously used. The sig- 
nificance of these variances can then be tested 
by comparison with the residual variance. In 
this situation and in the one described in the 
following paragraph, a significant reduced 
methods variance leads to a conclusion not 
restricted to a part of the schools, unless the 
data thus analyzed refer to only a part of the 
schools in the original sample. A new table 
should be prepared listing the sum of squares 
and products, the adjusted and reduced vari- 
ances, and the degrees of freedom for residual, 
schools, methods, and total. The residual Sxy 
and =x* are used in obtaining the regression 
coefficient used in adjusting the final methods 
means. The standard error of the differences 
between the various pairs of methods means 
may be obtained by the formula on page 234. 
(m will refer to the total number of pupils 
taught by a given method, 32° is the residual 
x*, and s* is the adjusted residual variance.) 
If the initial test means are not greatly dif- 
ferent for each of the total methods groups, 
the less precise standard error may be used. 
The adjusted residual variance is divided by 
the number of pupils per method, the square 
root is taken, and the result is multiplied by 
1.414. In both cases the degrees of freedom 
for ¢ is that of the residual variance.** 

If, as rarely happens, the redu' d variance 
for methods originally obtained was signifi- 
cant, but the methods X schools variance was 
equal to or less than the adjusted within 
classes variance, the procedure described in 
the last paragraph need not be used. The 
within classes =xy and =x* may be used in 
obtaining the regression coefficient for adjust- 
ing the methods means and the adjusted 


In a letter to the writer, FEE expressed distrust for 
pooling the within classes and M x S sums of squares and 
products unless there is no significant difference between re- 

coefficients obtained from the within ram and the 


ificantly" different from 
4 believe that their r 
coefficients are significantly different from that of ‘wi _ 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 14, No. 3 


within classes variance rather than the re- 
sidual variance may be used in computing the 
standard error. The degrees of freedom for ¢ 
would then be the degrees of freedom of the 
adjusted within classes variance. 


If more than one initial test is used the 
obtaining of each adjusted sum of squares 
referred to in the preceding paragraphs can 
be obtained by multiplying the appropriate 
original Sy? by the quantity (1 — R*) where 
the R is in each case calculated from the 
sums of squares and products in the line (or 
combination of lines) to which the adjusted 
sum of squares is to refer. 


HANDLING Data For AN EXPERIMENT 
INVOLVING AN ADDITIONAL VARIABLE 


Let us now assume that an experiment is 
to be concerned with more than just the 
evaluation of the relative merits of certain 
methods. The experimenter may desire to 
evaluate the methods in combination with 
certain devices or in combination with dif- 
ferent types of curriculum materials. Each 
class is divided at random into sub-groups, 
one sub-group to be subjected to each device 
or type of curriculum material. Such an ex- 
periment may serve to show what methods 
are best, what devices are best, and what 
methods work best with what devices.*" 

Another possibility is an evaluation of 
methods for pupils of different levels of abil- 
ity as determined by the initial test or tests, 
the number of pupils in each class assigned 
to the same level being equal. All of the 
classes in the experiment should be of approx- 
imately the same general level and spread of 
initial ability. It will then be possible to 
determine if certain methods are relatively 
more effective for pupils of superior ability, 
average ability, or inferior ability. It may be 
shown that the effectiveness of a method does 
not depend on the level of ability. 

In handling the data of either of the types 
of experiments referred to above the original 
data table may be divided into horizontal sec- 
tions for devices, curriculum materials, or 
levels. In the procedure outlined below we 
will refer for convenience to devices. 

Let d = devices, m; — the number of 
devices, and m, == the number of pupils per 
device. 

17In conducting such an experiment in a single school 

> Sa 2S ae tt yt 


new variabl | 
= yh - as the variable ec i Ros Sania se 
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Within Devices 


Schools & Methods * Devices 


Schools < Methods 
Schools & Devices 
Methods < Devices 


Schools 
Methods 
Devices 


== 


Be sure to sum each group of X scores and 
group of Y scores before securing a 


each 











mm 


(2s X)*] 
mdt 





Lm Ma Mr , 


(333 X)?] 
sdt 





a Ny Na N; a 


(333 .X)?] 
smt 











Ns bm Ny 


(>> xX)? 
smdt 
N 


product (=X)(3Y). 


The above procedure may be more clear if 
the diagrams of Table VII are studied. 


Sums of Squares, or Products 

a—b 

5 — Cum — Coa — Cma a d, + dm +d,s—e 
Com — 2g — dy + € 

Ca d, — dg a é 

Cuma — dm — da + e 

d,—e 

dn =e 

d, a= ¢ 

a—e 

Y, X,, or X,, etc. may be used in the formulas instead of X. 
XY may replace X* in the first formula and (=X)(=Y) may 
replace (=X)? in the second formula, etc. 


Sum the squares of ail the scores. 


Sum the scores in each sub-group, square the sum, total the 
squares for all sub-groups, and divide by the number of pupils 
in a sub-group. 


Sum the scores in each class, square the sum, total the squares 
for all classes, and divide by the number of pupils in a class. 


Sum the scores of the pupils subjected to the same device in 
each school (regardless of method), square the sum, total the 
squares for all sd such groups, and divide by the number of 
pupils subjected to a given device in each school (m,, ). 


Sum the scores of all of the pupils taught by the same method 
and device in all of the schools, square the sum, total the 
squares for all md such groups, and divide by the number of 
pupils taught by a given method and given device (m, ;). 


Sum the scores of all of the pupils in a given school, square the 
sum, total the squares for all s schools, and divide by the 
number of pupils in the school. 


Sum the scores of all of the pupils taught by a given method 
in all the schools, square the sum, total the squares for all m 
methods, and divide by the number of pupils taught by a given 
method. 

Sum the scores of all of the pupils subjected to a given device 
in all the schools, square the sum, total the squares for all d 
devices, and divide by the number of pupils subjected to a 
given device. 


Sum the scores of all of the pupils, square, and divide by the 
total number of pupils. 


Assume three methods and two devices in 
four schools. The shaded area in each case 
represents a group of X scores to be summed 
and squared, or a group of Y scores to be 
summed and squared, or a group of X scores 
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Johnson and Tsao recomme: . the combin- F 
ing of such interaction terms as S k M x D. 
S X M, etc., which do not prove significant, 
with the error term used as the basis of com- 
parison to obtain a “ré i sum of squares” 


WW 


and a group of Y scores whose sums are to be 
multiplied. The number of such areas to be 
summed before squaring or multiplying is 
indicated at the right. No diagrams are 
needed for a or e. 


TABLE VII 
DIAGRAMS TO ILLUSTRATE CALCULATIONS OF SUMS OF SQUARES AND PRODUCTS 
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smd = 24 such 
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Device 2 
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I II Ill IV 
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8 m = 12 such areas 
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The degrees of freedom for the adjusted or reduced variances: 


Within Devices 

Schools & Methods « Devices 
Schools & Methods 

Schools « Devices 

Methods * Devices 


Schools ane 
Methods tn — 
Devices Na — 


and a “residual variance” to be used as the 
error measure and in adjusting the final 
means. The appropriate degrees of freedom 
and the appropriate sums of squares and 
products are combined as explained on pages 
239-240. The table for reporting the final 
data is similar to those previously given. 


* Equals the sum of all the other degrees of freedom listed. 
The within devices d.f. is one unit less for each additional 
initial test, 


I 
I 
I 


N — (*) —2 

(#2, — 1) (tm — 1) (mq — 1) 
(", — 1) (%m — 1) 

(n, — 1) (mq — 1) 

(%m_ — 1)(", — 1) 


If the reduced S X D variance is signifi- 
cant and irrelevant factors have been con- 
trolled, the comparative effectiveness of the 
devices varies from school to school. Similarly, 
if the M x D variance is significant and 
irrelevant factors have been controlled, the 
comparative effectiveness of the devices varies 
from method to method, i.e., the effectiveness 
of a device depends on the method with which 
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it is used. A similar interpretation would be 
made of a significant methods X curriculum 
materials variance or a significant methods X 
levels variance. A significant M x L variance 
would indicate that the effectiveness of the 
methods is related to the intelligence, or to 
the initial achievement levels, of the pupils. 
A given method may vary in effectiveness 
with the level of the pupils to which it is 
applied. 

It is hoped that the reader will find this 
paper useful in planning and in conducting an 
experiment. The reader is urged not to depend 
on this paper exclusively. The references 
given below should be studied for further in- 
formation with respect to assumptions and to 
matters of interpretation. The texts by Lind- 
quist and by Snedecor will be particularly 
helpful. The two articles by Johnson and 
Tsao will be of great value to the experi- 
menter who wishes to conduct an elaborate 
experiment of the type just outlined. 
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EXPECTANCY TABLES: A METHOD OF INTERPRETING 
CORRELATION COEFFICIENTS 


REIGN H. BITTNER 
Owens-Illinois Glass Company, Toledo, Ohio 


CARLTON E. WILDER 
Captain, A. G. D., Northington General Hospital, A. S. F. 


The product moment correlation coefficient 
is commonly used by research workers as a 
statistical measure of the relationship between 
two measures of a group of individuals. In 
applied psychology this statistic is often used 
to define the correspondence between scores 
on tests or other measures of traits and apti- 
tudes and a criterion of proficiency on a job, 
in a training course in school or industry, or 
some other activity. In this case the ultimate 
objective is to evaluate the criterion poten- 
tialities of an individual or a group on the 
basis of other data. 

It is often desirable and sometimes neces- 
sary to present the results of correlation anal- 
ysis to the layman. However, the concept of 
correlation is difficult to interpret to the non- 
technically trained person. In addition, the 
coefficient itself provides neither the techni- 
cian nor the layman with a concrete evalua- 
tive statement of the expected criterion per- 
formance of individuals with given scores on 
the associated variable. Accordingly, a method 
is needed for reinterpreting correlation coeffi- 
cients in terms of the predictive significance 
of the association expressed in terms of 
simple mathematical concepts which are uni- 
versally understood.* 

Taylor and Russell (4) have developed 
tables for interpreting certain correlation 
coefficients in terms of the proportion of the 
selected group who will be expected to be suc- 
cessful if a given percent of the total group is 
considered successful on the basis of a mini- 
mum criterion score and a given percent of 
the total group is selected on the basis of a 
minimum score on the predictor variable. This 
method has considerable merit for interpret- 
ing the correlation coefficient in terms of its 
predictive significance if one is interested only 


1 Various functions of the correlation coefficient such as the 
coefficient of alienation, &, the coefficient of forecasting effi- 
ciency, E, and the coefficient of determination, 7*, have been 
used to aid in the interpretation of the correlation coefficient 
but none of these are superior to the coefficient itself for these 


Purposes. 


in group predictions and if it can be assumed 
that the correlation surface is normal. How- 
ever, since the interpretation of the correlation 
coefficient in terms of the expectancy of suc- 
cess for persons making given predictor scores 
is often of primary interest, another method is 
needed. 

A method for interpreting selected correla- 
tion coefficients in terms of individual predic- 
tions has been given in tables such as those 
published by Peters and Van Voorhis (3) and 
Bingham (1). The former table gives the 
probability of an individual reaching or ex- 
ceeding a given decile interval of the criterion 
distribution knowing the individual’s decile 
standing on the predictor variable, and the 
latter gives a person’s most probable standard 
score in the criterion for certain selected 
standard scores in the predictor variable. 
These tables are useful provided the assump- 
tion of a normal correlation surface is tenable 
but both lack completeness and the latter is 
couched in complex mathematical terms. A 
more flexible method is desirable which will 
enable predictions to be made with ease in 
generally understood mathmatical terms for 
any specified combination of scores on the 
correlated variables and any correlation 
coefficient. 

Correlation coefficients may be recast as 
expectancy tables which show the percentage 
of persons making a given score on one vari- 
able who will be expected to equal or exceed 
a given score on the associated variable. For 
example, if a correlation between a test of 
scholastic aptitude and school grades in a cer- 
tain course is .51, a typical expectancy table 
might show that for persons making test 
scores of 60, 80, 100, 120, and 140, the per- 
centages of these students expected to be suc- 
cessful in the course would be 3, 14, 39, 70, 
and gt respectively. The degree of relation- 
ship is demonstrated by the concurrent in- 
crease in the percentages as test scores in- 
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TABLE I 


PERCENT P EXPECTED TO MAKE CRITERION SCORE EQUAL TO OR EXCEEDING 
THE GIVEN CRITICAL SCORE Y. FOR ALL VALUES OF 2. 


Ze P when sign of z, is: 


. 0125 51 
. 0376 52 
. 0627 
. 0878 
. 1130 
. 1383 
. 1637 
- 1891 
. 2147 
. 2404 
- 2663 
- 2924 
. 3186 
. 3451 
. 8719 
. 3989 
. 4261 
. 4538 
. 4817 
. 5101 
- 5388 
. 5681 
. 5978 
. 6280 
6588 


P when sign of z, is: 


BND Dt tt tt et et ee et et et es es 


INOW OID“I100 0 


Determine value of P for a given value of z. as follows: 


1. When sign of z. is positive, locate tabular 
value of z. equal to or next larger than the 
absolute value of z. and read P in column 
for itive values of z. (For +z. with 
ag value greater than 2.5758, P equals 
zero 


crease, and an individual's test score can be 
interpreted in terms of that individual’s 
chances of success. Correlation results pre- 
sented in this form are readily understood by 
the nontechnically trained person. In addi- 
tion, a simple mathematical statement is pro- 
vided which evaluates the association in 
terms of its predictive significance. 

The computation of expectancy tables 
from the product moment correlation coeffi- 
cient, 7, may be simplified greatly by the use 
of appropriate formulae and facilitating 
tables. Formulae will be developed and facili- 
tating tables presented for the general case 
and one special case of interest. In the general 
case, the problem is to determine for any rxy 
the percentage of persons making any given 
X score who would be expected to equal or 
exceed a given critical Y score. If the critical 
Y score is taken as the mean of the Y scores, 


2. When sign of z. is negative, locate tabular 
value of z equal to or next smaller than 
the absolute value of z. and read P in col- 
umn for negative values of z. (For —=<. 
with absolute value less than 0.125, P 
equals 50) 


a special case is presented for which percent- 
ages can be tabled, thus eliminating further 
computations. 

GENERAL CASE 


The predicted score Y for a given X score 
when computed from the linear regression 
equation may be assumed to be the mean of 
a normally distributed array of Y scores cor- 
responding to the given X score. The stand- 
ard deviation of this distribution is given by 
the customary formula for the standard erro: 
of estimate oy\/ 1 —r*. Now let Y, designate 
the critical Y score. It follows then that the 
deviation of Y. from the mean of the array 
(Y) expressed in terms of the standard devia- 
tion of the array is 
Y.—Y 
3.= = 
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Substituting in equation (1) for Y gives 


Yi, wo fae, tp ¢ Bhs a Be 
ox ox (2) 





oyVi—r 
Reducing equation (2) and rearranging terms 
gives 


rMx :|- 
oxViI—r 





oe -— 


Y.— My 
or rae tae 


In equation (3), the quantity inclosed in 
brackets is a constant for any given problem 
as is the coefficient of X. Let the former con- 
stant be designated as K, and the latter as 
K,. Then equation (3) may be written 


:— 
Z- — 


which is the equation desired. 

The computation of the pércentage of per- 
sons expected to equal or exceed the given 
criterion score (Y.) for each of a series of 
predictor scores (X) may now be made. De- 
termine the constants K, and K, in equation 
(4). Then solve equation (4) for zs. corre- 
sponding to all or selected values of X. 
Finally, percentages are determined by evalu- 
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ating the probability integral for the portion 
of the normal curve lying between the com- 
puted z,. and z, == .. Although these values 
may be determined from the customary tables 
of the normal probability integral, Table I 
enables reading them to the nearest percent 
without the necessity for further manipulation 
of the tabular values required by the orienta- 
tion of the customary tables. Table I thus 
simplifies and reduces errors in the determina- 
tion of the percentages. 


An example will illustrate the interpretation 
of the correlation coefficient by this method. 
Assume that the correlation between an apti- 
tude test X and grades in a training course Y 
is .45 and My = 105.2, ox = 18.7, My = 
80.1, cy = 7.5. It is desired to determine the 
percentage of persons with test scores of 50, 
75, 100, 125, and 150 expected to equal or 
exceed a grade of 75 (Y.). The constants K, 
and K, in equation (4) are calculated to be 
2.0733 and .0269 respectively. Then solving 
equation (4) successively for the test scores, 
z. values of .7260, .0523, —.6214, —1.2950, 
and —1.9687 are obtained. Referring to 
Table I, the desired percentages are found to 
be 23, 48, 73, 90, and 98. They may now be 
presented in an expectancy table as in Table 
II or graphically as in Figure 1. 
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TABLE II 


EXPECTANCY OF MAKING A GRADE OF 75 OR 
HIGHER FOR PERSONS WITH GIVEN 
APTITUDE TEST SCORES 

Percent 
Expected to 
Make Grade of 
75 or Higher 


SPECIAL CASE 


It is often convenient and useful to develop 
expectancy tables which will give the percent- 
age of persons having any given X score who 
will be expected to make Y scores equal to or 
exceeding the average Y score. If no definite 
critical Y score has been determined, the aver- 
age Y score is a particularly meaningful refer- 
ence point and its selection leads to conveni- 
ent simplification of the computation of the 
desired percentages. If in equation (3) My 
is substituted for Y., then the equation 
becomes 


a a 
oxVI—r oxVI-—r 





or 





=~ 


Vi—r 


and finally 


which is the equation desired. 

Equation (7) enables easy determination 
of the percentage of persons having any given 
X score expected to obtain a Y score equal 
to or exceeding the average Y score. The 
equation is solved for z, when X takes the 
values desired. The percentages may then be 
obtained from Table I or from the usual 
tables for the area under the normal curve by 
determining the area from 2, to 2. == ». 

A table may be developed from which the 
percentages in this special case can be read 
directly for any given z, and any r. Table III 
gives the percentages for any 2, and any 7 
between .20 and .79 inclusive. The table was 
computed by calculating the limiting z, values 
which will determine any given expected per- 
centage, P, of equalling or exceeding My. The 
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z, value for the smallest decimal value which 
when rounded gives P was substituted in 
equation (7) and the corresponding z, com- 
puted. For example, if P = 34, the smallest 
value of P was taken to be 33.5. Reading z, 
from appropriate tables (2) gives .426148. 
Substituting for z. in equation (7) and soly- 
ing for z, when r takes values of .20 to .79 
in turn gives the tabular values of z, for P — 
34. Values of 2, greater than 4.00 are omitted 
from the table since they are seldom of in- 
terest. The P value corresponding to z, —= 
—.426148 is 66.5 which is the smallest deci- 
mal value that gives P = 67 when rounded. 
Accordingly, the tabular values of 2, com- 
puted for P == 34 are the same in absolute 
value for P == 67. However, since the limiting 
absolute values of 2, decrease successively for 
P values from 1 to 50 and then increase suc- 
cessively for P values from 51 to 100, differ- 
ent procedures are required to locate appro- 
priate limiting tabular values of z, when 2, 
is positive or negative. 

The procedures for finding in Table ITI the 
percentage P of persons having any given X 
score who will be expected to make Y scores 
equal to or exceeding the average Y score 
differs when z, is positive or negative. After 
X scores have been converted to standard 
scores (2), locate the column for the appro- 
priate rxy and proceed as follows to locate z, 
values in this column and to find correspond- 
ing P values: 

1. When sign of 2, is positive, locate tab- 
ular value of 2, equal to or next smaller than 
the absolute value of 2, and read P in mar- 
ginal column for positive values of z,. (For 
+z, with an absolute value less than the 
smallest tabular value in the column, P = 
50.) . 

2. When sign of z, is megative, locate tab- 
ular value of 2, equal to or next larger than 
the absolute value of z, and read P in mar- 
ginal column for negative values of z,. (For 
—2z, with an absolute value larger than the 
tabular value corresponding to P = 1, P = 
zero.) 

It should be. noted that the differential pro- 
cedure for locating in Table III the appro- 
priate z, values according to their positive or 
negative signs is the reverse of that given for 
z. values in Table I. This is necessary since 
z, and z. for a given P value have opposite 


signs. 
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TABLE I1I—Continued 


Values of zx according to correlation coefficient rxy 


PERCENTAGE P oF PERSONS HAVING ANY GIVEN z, SCORE WHO WILL BE EXPECTED TO MAKE Y Scores EQUAL TO OR EXCEEDING THE 
AVERAGE Y ScoRE WHEN rx, EQUALS .40 TO .59 
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The example used for the general case will 
serve to illustrate the use of Table III. It is 
now desired to estimate the percent of per- 
sons having test scores of 50, 75, 100, 125, 
and 150 who will be expected to make grades 
equal to or exceeding the average course 
grade. The test scores converted to standard 
scores give z, values of —2.95, —1.61, —.28, 
+1.06, and +2.40 respectively. Now con- 
sidering a test score of 50, in the column for 
r == .45 find the value equal to or next larger 
than 2.95. In this case, 3.00 is located since 
2.95 is not given in the column. The percent 
P is now read from the marginal column for 
—z, and found to be 7. Similarly, the per- 
cents for test scores of 75 and 100 are found 
to be 21 and 44 respectively. The procedure 
for test scores of 125 and 150 is different 
since their standard scores are positive. For a 
test score of 125, locate in the r = .45 
column the value equal to or next smaller 
than 1.06. This is found to be 1.01 and read- 
ing P from the marginal column for +2, 
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gives 70. Similarly, the percent for a test score 
of 150 is found to be 89. These percents can 
now be presented as an expectancy table or 
graphically as shown for the general case. 
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A NOTE ON SOLUTIONS OF ANALYSIS OF VARIANCE FOR 
THE PROBLEM OF UNEQUAL OR DISPROPORTIONATE 
SUBCLASS NUMBERS 
Fer Tsao 
National Central University, Chungking, China 
and 


PALMER O. JOHNSON 
University of Minnesota 


Tsao* has presented the solutions of anal- 
ysis of variance for the problem of unequal 
or disproportionate subclass numbers under 
different assumptions. We may treat those 
solutions as general ones with the problem of 
equal or proportionate subclass numbers as a 
special case. To give a simple illustration, let 
us assume that we have a variable y capable 
of the classifications of two columns and two 
rows. Define y,;, as the ¢-th observation on y 
in the subclass (s, 4), where s == 1, 2 denote 
first and second columns, respectively, i == 
1, 2 denote first and second rows, respectively, 
and ¢ == 1,...,,;. Define again: 


VS yoie = ST: = 
st Ss 
TTVoie == BT si = 
it é 
—_—* = =e 


*F. Tsao, “General Solution of the Analysis of Variance 
and Covariance in the Case of Unequal or _ rtionate 
Numbers of Observations in the Subclasses,” . Thesis, 
University of Minnesota, 1945, 115 pp. 


The solutions for the estimates of the sums 
of squares for the different sources of varia- 
tion under different assumptions are as fol- 
lows: 
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zero interaction: 
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(4) Main effects under the assumption of 
significant interaction: 
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class numbers. We can show that those solu- 
tions will approach these same results. The 
mathematical operations are not difficult. In 
the case of equal subclass numbers, (12) 
easily reduces to: 


337",; 
BBB, —_53 


sit n 


which is identical to (21). Next we show that 
the following identity holds, if »,, =n. 


on _ [n(y,.—9-)]? 


= n(y, . —¥,.)? 
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Therefore, (13) reduces to (22) and (14) 
reduces to (23). Again, the following identity 
holds, if n,; ==. 


= n(y.,.—y.s)* 





a.) 7) 


In the special case of equal frequencies 
with 2 x 2 classification, we have the follow- 
ing estimates: 

337",; 


(1) Within: 333y4,4.— 54 
sit n 





mananns 


where m is the number of observations for 
each subclass. 


(2) Interaction: 3377,, %77.; 
si — 3 
n 


—n(¥,.—-)* 





(3) Main effects: 
(3.1) Column: n(y,.—Y2-)? 
(3.2) Row: m(y.;—¥-s)? 


Now we go back to the solutions for the 
problem of unequal or disproportionate sub- 


n 


i.e., (16) reduces to (24). Now we show that 
(15) is also identical to (24), if #., =n. 
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+ "(¥,.— Ve-)*? —2n(y*.. + ¥-s) 
—n(y,. + ¥q.)*? == 2n(y*?., + ¥.,) 
—n(¥.4 + 9-2)? =m(¥-.— Ya)? 
When the interaction exists and under the 
restrictions of “the weighted means”, we 
have: 
m, .,- 
n. 
ie., (17) is s identical to (23). 
8.,Beg 7 
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ie., (18) is identical to (24). When the inter- 
action exists and under the restrictions of 
“the unweighted means’’, we have: 
Fa ve 9x) 

t 


[2(y, -—y2 -)]? 
> ".; F 4 
i (=~) n 
= n(y,.—92-)? 
i.e., (19) is identical to (23). 
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i.e., (20) is identical to (24). So we conclude 
that in the case of equal frequencies, all the 
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solutions under different assumptions concern- 
ing main effects will approach the same 
results. 

It is noted that in the case of proportionate 
frequencies, we can also treat the data in the 
usual manner, when we estimate the sums of 
squares for “within” and “interaction”. For 
the main effects, however, we may treat the 
data in the usual manner only if we assume 
that there is no interaction. On the other 
hand, when the interaction exists, then we 
can treat the data in the usual manner only 
if we use the restrictions of “the weighted 
means’’. In other words, when the interaction 
exists and under the restrictions of “the un- 
weighted means”, we have to use equations 
(19) and (20) to estimate the sumfis of 
squares due to columns and rows, res 4 
even if the class numbers are proportional. 
The details of mathematical demonstrations 
will not be given here. 





THE PLACE OF INSTRUMENTATION IN THE READING 
PROGRAM: I. EVALUATION OF THE OPHTHALM-O-GRAPH 


Irnvinc H. ANDERSON and Wr1LL1AM C. Morse 
University of Michigan 


Raymond Dodge, in 1900, originated the 
photographic method of recording eye-move- 
ments. Scores of eye-movement cameras have 
since been built by research workers in labo- 
ratories throughout the country and hundreds 
of theses and articles have been written re- 
porting the results of studies made. Of the 
findings relating to the psychology of reading, 
the chief one is that good and poor readers 
can be differentiated in terms of the pattern 
of their eye-movements. Good readers make 
relatively few fixations per line, relatively few 
regressions, and the average duration of their 
fixations is relatively brief. Poor readers re- 
quire relatively many fixations per line, rela- 
tively many regressions, and the average time 
of their fixations is relatively long. Since there 
are these differences, the question has been 
raised whether eye-movement records might 
not be used to supplement the usual paper- 
and-pencil tests of reading achievement or 
even replace them. This question must be 
answered in terms of the reliability and valid- 
ity of the method, the usual test criteria, as 
well as in terms of ease of handling and 
economy. The Ophthalm-O-Graph* is the 
only eye-movement camera that is at all prac- 
tical for general school use. The present 
article is accordingly related to this instru- 
ment. 


RELIABILITY OF THE OPHTHALM-O-GRAPH 


Table I summarizes the results of studies 
which have dealt with the reliability of eye- 
movement scores. Imus, Rothney, and 
Bear (10), and Broom (2) secured their rec- 
ords with the Ophthalm-O-Graph and used as 
the material read before the camera the pass- 
ages distributed with the instrument. These 
passages were each 50 words in length. Broom 
used two selections which were read before the 
camera during different sittings separated by 
at least one day. Imus, Rothney, and Bear 
used three cards, all of which were read dur- 
ing the same sitting. In both studies three 


Jt ee, Gee ae ee 
get the American Optical Company, Southbridge, Massa- 


measures were computed for each subject: 
1. fixation frequency, 2. regression frequency, 
and 3. rate of reading. The test-retest method 
was employed to determine the reliability of 
these measures. Table I shows that the reli- 
ability coefficients were low, especially so in 
the Imus, Rothney, and Bear study. The cor- 
relations for fixation frequency, which was 
the most reliable measure, ranged from .61 to 
.72 for different combinations of the three 
cards. These correlations are too low for indi- 
vidual diagnosis and placement. The same 
may be said for Broom’s results, even though 
his correlations were somewhat higher. A cor- 
relation of .9o is the usually accepted stand- 
ard in this connection. 

The chief reason that low correlations were 
obtained in the above two studies is that the 
passages used were too short. Fifty words are 
not enough to yield a reliable measure. An- 
other limitation is related to the method used 
to test the subject’s comprehension of the 
material. Ten true-false questions are fur- 
nished for each card. These questions appar- 
ently do not yield an adequate measure of 
comprehension or even an indication that the 
subject has read the material carefully. Imus, 
Rothney, and Bear found that their subjects 
answered almost as many questions correctly 
without reading the material as with reading 
it. Broom found that the reliability of these 
true-false tests was practically nil, the reli- 
ability coefficient being only .ro. 

Eurich (5, 6), Frandsen (8), and Litterer 
(11) have obtained fairly high reliability co- 
efficients for various eye-movement measures. 
These results, however, are affected by the 
methods which were employed to determine 
reliability. In one of his studies Eurich used 
the split-half method, Litterer also used the 
split-half method, and Frandsen used the odd- 
even method. Correlations obtained by these 
methods are based on a single performance 
and are likely to be higher than correlations 
obtained by the test-retest method. Stepped- 


up by the Spearman—Brown formula for the 
full length of the material used, the results 
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nonetheless give some indication of the 
amount of material needed for a reliable test. 
Litterer’s estimated correlations for fourteen 
lines of easy prose indicate satisfactory reli- 
ability. Frandsen’s correlations for an unspe- 
cified amount of scientific prose are close to 
satisfactory for fixation frequency and total 
perception time. Eurich’s figures for a total 
of only 124 words are satisfactory for pause 
duration and close to satisfactory for fixation 
frequency. In Eurich’s second study satisfac- 
tory reliability was obtained for fixation fre- 
quency and regression frequency and close to 
satisfactory reliability for pause duration. 
The test-retest method was used in this study, 
three separate paragraphs of non-technical 
material were read by each subject all during 
one sitting, and correlations were computed 
between the results for every combination of 
two paragraphs. The correlations in Table I 
were estimated by the Douglass—Cozens 
formula for the reliability of test batteries 
and represent the results for a combined total 
of 181 words. 

In none of the above studies was an 
attempt made to determine the reliability of 
eye-movement scores for varying amounts of 
material by the test-retest method where the 
tests were administered during different sit- 
tings. Tinker’s study (17) comes closest to 
meeting these ideal requirements. Correlations 
were run for a maximum of 23 versus 23 lines 
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of easy prose as well as for a maximum of 
38 versus 39 lines of difficult prose. The 
highest correlation obtained for a single 
measure was .88. This was for fixation fre- 
quency in the case of the easier material. The 
average correlation for all measures at both 
difficulty levels was .82. 


Actually, the correlations in Table I com- 
pare rather well with the reliability coeffi- 
cients for some of the more commonly used 
standard paper-and-pencil tests of reading 
achievement. Table II presents the reliability 
figures for a representative sampling of the 
latter. The results for rate of reading in the 
case of the four tests which contain this 
measure are perhaps the most significant for 
comparison with the findings for eye-move- 
ment scores. Measures of eye-movement are 
essentially measures of rate of reading. There 
is very little choice: eye-movement scores are 
just as reliable as standard paper-and-pencil 
tests of reading speed. The remainder of the 
correlations in Table II are in general higher 
than the reliability coefficients for eye-move- 
ment scores, but the typical paper-and-pencil 
test also contains considerably more material 
than the typical test before the eye-movement 
camera. The difference frequently amounts to 
a thousand words or more. Considering this 
difference, it is remarkable that eye-movement 
scores have turned out as reliable as they 


TABLE II 
RELIABILITIES OF TEN STANDARD READING TESTS 
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have. It appears that eye-movement measure- 
ment is fundamentally a reliable technique. 

To summarize the facts on reliability: The 
eye-movement technique will yield reliable 
scores if an adequate amount of material is 
used. Tinker (18) has recommended a mini- 
mum of twenty lines. Test-retest findings for 
this amount of material may not yield reli- 
ability coefficients of .go. If this requirement 
were held to rigidly, many standard tests now 
on the market would have to be withdrawn. 
For the identification of extreme cases, for use 
together with other measures on the child, 
twenty lines are enough. Half this number are 
sufficient for group comparisons. 


VALIDITY OF THE OPHTHALM-O-GRAPH 


The universal method for determining the 
validity of eye-movement scores has been to 
correlate these measures with scores on stand- 
ard tests. Table III summarizes the results of 
studies which have investigated the validity 
of the eye-movement technique. For the most 
part the validity coefficients are low. The cor- 
relations obtained by Imus, Rothney, and 
Bear (10) were only .40, .25, and .32 respec- 
tively for fixation frequency, regression fre- 
quency, and rate of reading. Eurich (5, 6) 
correlations were equally as low. Litterer’s 
(11) figures while higher still permit of sig- 
nificant variation between the measures. The 
same may be said for Anderson’s (1) results. 

The above results are not alarming. 
Strang (15) found a correlation of only .28 
between paragraph meaning as measured by 
the Minnesota Reading Examination and par- 
agraph meaning as measured by the Iowa 
Silent Reading Test. Gates (9) found a corre- 
lation of .53 between Brown rate and Courtis 
rate and one of .54 between Courtis rate and 
Monroe rate. Eurich’s (4) correlations be- 
tween scores on different forms of the Minne- 
sota Speed of Reading Test and different 
forms of the Chapman-Cook Speed of Read- 
ing Test range from .63 to .76. Results ob- 
tained by Broom, Douglas, and Rudd (3) 
have the same import, namely, that low corre- 
lations frequently occur between standard test 
scores. Only when the materials concerned are 
comparable in nature will high correlations be 
obtained. Thus, Paterson and Tinker (12) 
found a correlation of .86 between scores on 
Forms A and B of the Chapman—Cook Speed 
of Reading Test, but Tinker (16) one of only 
.46 between rate on easy prose and rate on 
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scientific prose. Pressey and Pressey (13) 
investigated the interrelations between four 
highly reliable reading scales, with the fol- 
lowing results: 

General 1 vs. General 2 r= .85 

General 1 vs. poetry r= .38 

=] RO eS 

. P=. 

General 2 vs. scientific r= .49 
scientific vs. poetry r= .b6 
These findings demonstrate strikingly how the 
relationship between reading tests is affected 
by the nature of the material. Similar results 
have been obtained by Robinson and Hall (14). 

The eye-movement technique is subject to 
the same conditions that govern the relation- 
ships between standard tests: high correla- 
tions will only be obtained when the material 
read before the camera and the content of the 
criterion are comparable. By using two selec- 
tions of fourteen lines each from the 
Chapman—Cook Speed of Reading Test as the 
selections to be read before the camera and 
scores on the entire test as the criterion, 
Tinker (17) has taken the above condition 
into account. His correlations, when corrected 
for attenuation, ranged from .80 to .99 for 
fixation frequency and from .87 to .go for 
total perception time. Low correlations were 
reported for pause duration. No correlations 
were reported for regression frequency. The 
type of reading which the Chapman—Cook 
Test brings into play makes it difficult to 
measure regression frequency accurately. It is 
probable, however, that regression frequency 
at best is a measure of only fair validity. 

To summarize the facts on validity: When 
determined in the usual way, the validity 
coefficients for eye-movement scores turn out 
low. Low correlations have similarly been 
found between standard test scores. Reading 
skill is to a large extent specific to the mate- 
rial used and to the test situation. High cor- 
relations require the use of comparable mate- 
rials in the camera situation and in the cri- 
terion. When determined on this basis, the 
validity coefficients for fixation frequency and 
total perception time turn out extremely high. 
Low validity will be found for pause duration 
and only fair validity for regression fre- 
quency. 

DISCUSSION 
From the standpoint of reliability and 


validity, there is little choice between the eye- 
movement technique and standard tests. The 
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deciding factor will ordinarily be one of con- 
venience. Standard tests can be administered 
to whole groups at once, they are easier to 
handle than the eye-movement technique, and 
they will ordinarily involve less expense than 
the latter. Eye-movement measurement will 
never replace standard tests. This is no re- 
flection on the value of eye-movement records, 
but merely a recognition of the fact that 
school authorities will select the method which 
is most practical. It would be desirable to 
obtain an eye-movement record on each child. 
Because of the low correlations which exist 
between measures of reading in general, every 
new measure that is obtained will contribute 
something new to our information about the 
child’s reading status. However, if it came to 
photographing the eye-movements of any con- 
siderable number of children, the eye-move- 
ment technique would again run into the bar- 
rier of inconvenience. It would be easier to 
administer a second standard test, if the main 
purpose is to obtain a supplementary measure 
on each child. 

It has been suggested that a record of the 
frequency, placement, and duration of fixa- 
tions provides important information regard- 
ing vocabulary difficulties. This is true, 
although Fairbanks (7) has shown that fixa- 
tions can be accurately located from 
Ophthalm-O-Graph records only if a correc- 
tion is made for the tendency of the eyes to 
converge as they move down the page. 
Whether eye-movement records will be col- 
lected to disclose vocabulary difficulties will 
again depend on factors of convenience. It 
will ordinarily be easier to resort to an oral 
reading examination. 

The Ophthalm-O-Graph has been most fre- 
quently used in clinical work with retarded 
readers. In a clinical setting eye-movement 
records have the distinct advantage of being 
graphic. As Tinker (18) has observed, it is 
more convincing to show a person that he 
makes too frequent fixations and regressions 
than to tell him merely that he reads too 
slowly. In other words, eye-movement records 
have distinct motivational value, which is all 
to the good from the standpoint of clinical 
work. Eye-movement records can also be used 
to check for the accuracy with which the re- 
turn sweep is made, the accuracy with which 
fixations are made along the line, abnormal 
degrees of convergence and divergence at the 
beginning and end of fixations, and binocular 
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coordination. Information of this sort is useful 
in diagnostic and remedial reading. 

The Ophthalm-O-Graph has found popular 
use as a research instrument. In this connec- 
tion various workers have found it advisable 
at times to modify or adapt the instrument to 
their needs. These modifications have in- 
cluded the addition of a head line, provision 
for the simultaneous recording of eye move- 
ments and voice, and provision for a slower 
camera speed, interrupted time line, and 
paper film. 

SUMMARY 


The following specific statements may be 
made regarding the Ophthalm-O-Graph: 


1. When used with passages of suitable 
length, it will yield reliable records. Twenty 
lines have been recommended as the minimum 
to use in work with individual cases. 

2. When properly determined, the validity 
of the technique can also be demonstrated to 
be satisfactory. 

3. Ophthalm-O-Graph records can thus be 
used either to replace or supplement standard 
paper-and-pencil tests of reading achieve- 
ment. 

4. For reasons of convenience, however, 
standard tests will remain the universal 
method of testing reading achievement. 

5. The Ophthalm-O-Graph can be used 
with greatest profit in clinical work with 
retarded readers. 

6. The Ophthalm-O-Graph has been a pop- 
ular research instrument and will continue to 
be so. 
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CERTAIN PROPERTIES OF THE CORRELATION COEFFICIENT 


H. G. JoHNSON 
University of Minnesota 


One of the earlier formulas used by Pear- 
son in computing the product-moment coeffi- 
cient was the difference formula. When the 
means and standard deviations of the two 
measures are equal, this formula may be put 
in these forms: 

,—2(#—»)? 
22x" 
oe: 3(X — Y)? 


i= 


= 


Throughout this paper, small x and y will 
refer to deviations from respective means, and 
large X and Y will refer to raw scores. 

The expression, £(X — Y)?, in the second 
formula may be considered as being equal to 
four times the sum of the trait variances— 
the trait variance being computed in the usual 


manner by the formula, V, = o;,? =a 


Suppose that a pupil makes scores of 96 and 
90 on two different tests. The mean of these 
two scores is 93, the deviation from the mean 
is three in each case, and N is equal to two 
since there are two traits. When computed in 
this manner, the square of the difference 
between any two scores will be equal to four 
times the trait variance: 
(X—Y)*=4V, 
and 
=(X — Y)? = 43V, 

Substituting in formula (2) we get 

il ae 42V, 

23V;, 

=x? 

23V, 

NV, 


°=—-I— 


‘= I— 


in which V, refers to trait variance, V, the 
mean of the trait variances, and V, the indi- 
vidual variance. 

From formula (4), it is apparent that a 
correlation coefficient expresses how individ- 
uals differ in two traits in terms of how they 
differ from one another. The coefficient 
measures trait variability (intra-individual 


differences) in terms of individual variability 
(inter-individual differences). There is little 
cause for wonder, therefore, that individual 
variability, or range, may exert a pronounced 
effect on the size of a coefficient. 

Since the correlation coefficient measures 
trait variability in terms of individual vari- 
ability, the intercorrelations between traits 
may be used to determine the per cent that 
trait variability is of individual variability. 
Ghiselli (1) and Preston (4) have worked out 
a formula for this purpose. In the symbols 
used by Ghiselli, the formula is 

. (Mz) 


M,=1——— 

n n 
in which M, is the mean variance of trait 
variability, m is the number of tests or traits, 
and M, is the mean of the inter-correlations. 
When unit deviations are used, M, is the per 
cent that trait variance is of individual vari- 
ance. 

The writer used this formula to compute 
the per cent that trait variance is of individ- 
ual variance for 55 pupils on the ten subtests 
of the Stanford Achievement Test and ob- 
tained a result of 47.97 per cent. Actual com- 
putation of the mean trait variance and the 
mean individual variance by the same proce- 
dure as employed by Hull (2) produced a 
result of 48.38 per cent that trait variance is 
of individual variance. The difference in the 
results is only .41 of one per cent. The writer, 
therefore, regards the Ghiselli—Preston 
formula as being highly reliable. 

Students in education are often warned not 
to regard the correlation coefficient as a per 
cent. Actually it is one minus a per cent; that 
is, it is one minus twice the per cent that 
trait variability is of individual variability. 
This is indicated by formula (4) or by solv- 
ing Ghiselli’s formula for r when # is equal 
to two and M, is a single coefficient. When 
M, is equal to a single coefficient, r, Ghiselli’s 
formula becomes 

r=1— 2M, 
It will be recalled that M, is the per cent 
that trait variance is of the individual vari- 
ance. 

Thus a correlation of one indicates that the 
trait variability is zero per cent of individual 
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variability; a correlation of .50 indicates 25 
per cent; a correlation of zero, 50 per cent; 
and a correlation of minus one indicates that 
the trait variability is 100 per cent of the 
individual variability. In a way, then, we may 
regard the coefficient as a per cent. 

It is proper to average per cents only when 
the base is constant. We cannot say that the 
average of 40 per cent of 200 and 60 per cent 
of 400 is 50 per cent of either 200 or 400— 
it is even incorrect to say that the average 
will equal 50 per cent of 300. On the other 
hand, it is correct to say that the average of 
40 per cent of 200 and 60 per cent of 200 is 
50 per cent of 200. In the case of the correla- 
tion coefficient, we may regard individual 
variability (or range) as the base. It follows 
that we should not average coefficients unless 
they have been obtained from samples of the 
same variability. 

In spite of the argument just cited, the 
writer believes that little error will be com- 
mitted when coefficients obtained on samples 
from the same age or grade groups are aver- 
aged, even though the variability of the 
samples may differ. Thus, if a coefficient of 
.50 is obtained between two traits for a fairly 
homogeneous group of sixth- grade children, 
and a coefficient of .65 is obtained between 
the same two traits for a heterogeneous group 
of sixth-grade (or fifth-grade) children, the 
two coefficients may be averaged with only 
slight error resulting. Reasons will be given 
later why the writer feels that under such 
conditions differences in range have only a 
slight effect on the magnitude of a trait coeffi- 
cient. Of course, the research worker who does 
not care to be criticized can always quote the 
median when he desires to give a typical 
coefficient for a typical sample. The mean, 
however, seems to be more commonly em- 
ployed in statistical work and there is no 
reason why both cannot be given. 

To adjust correlations for a difference in 
range, Kelley (3) developed a formula which 
may be written in this form 

Vi,a—r 
owe — TD 


in which V, is the variance of the first sample 
and V, the variance of the second sample. 
The estimated coefficient for the second 
sample is r,. 

One way of developing this formula is as 
follows. When %x* is equal to Sy’, it can 
readily be shown that 3(x — y)? is equal to 
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23x*(1 — r). Formula (1) may therefore be 
written as 
23e7(1 —f) 


r= I— 


Since =~ is equal to V, formula (6) may 


be put into these forms: 


The relation between formula (8) and 
Kelley’s formula is evident. Kelley has simply 
substituted in the denominator of the ratio 
the variance of the second sample for the 
variance of the first sample in order to esti- 
mate the correlation for the second sampie. 

Such a procedure is permissible only when 
V,(1 —1,) is equal to V,(1 — r,), or, if we 
trace these two terms back to their source, 

nn Sis —».)2 
when HA Wis equal to 2(%2 — 92)" 
ors 2 

When the number of individuals in both 
samples is the same, Kelley’s formula may be 
written in this form: 

= (x, a y:)? 

—9 23x,” 


Thus we see that the basic assumption to 
the proper use of Kelley’s formula is that the 
average value of (x — y)? in the first sample 
must equal the average value of (x — y)? in 
th d " =(x,— y,)? is 

e second sample, or that 
3(*, — 92)? 


1 
equal to . There are situations 


4¥o 


where this assumption may be met to a satis- 
factory degree. If bright, average and dull 
children make errors of about the same size | 
and type, then the assumption is satisfied in 
the instance of reliability coefficients. Kelley’s 
formula may therefore give fairly reliable 
results in adjusting reliability coefficients for 
differences in range. 

Another instance in which the basic 
assumption to Kelley’s formula apparently is 
satisfied is in estimating the trait coefficient 
(or reliability coefficient) for a group com- 
posed of children from several consecutive 
grades (or non-consecutive grades) from the 
coefficient obtained on a group composed of 
children from the same grade. Apparently 
trait coefficients do not differ greatly from 
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grade to grade; that is, the correlation be- 
tween reading and arithmetic in the fifth 
grade is about the same size as the correlation 
between the same two abilities in the sixth 
grade. Also, the standard deviations of suit- 
able tests do not differ greatly from grade to 
grade. We may assume, therefore, that the 
average value of (x — y)? is about the same 
in the various grades. There can be little 
objection to the use of Kelley’s formula to 
estimate the trait or reliability coefficient for 
a range of several grades from a coefficient 
obtained from a range of one grade. It may 
be mentioned also that Peters (5) has fur- 
nished some empirical data which support the 
belief that Kelley’s formula gives reliable 
estimates under such conditions. 

In educational research it is often desired 
to compare correlations obtained from intact 
school grades or age groups. If Kelley’s 
formula held for such correlations, then a 
coefficient of minus one would be changed to 
a coefficient of zero, and a coefficient of zero 
would be changed to .50 merely by doubling 
the variance of the sample. A coefficient 
would be so unstable as to be practically 
meaningless unless a couple of standard devi- 
ations were appended to it. Even then its 
meaning would not be clear as the standard 
deviations of different tests usually are not 
equivalent. 

From here on the discussion of the influ- 
ence of range on the magnitude of coefficients 
will relate to cases of this type: If the corre- 
lation between two traits has been obtained 
for a relatively homogeneous group of sixth- 
grade children, what is the best estimate of 
the correlation between the same two traits 
for a larger more heterogeneous group of 
sixth-grade (or fifth-grade) children? 

We do not know whether bright, average 
or dull children exhibit the most trait vari- 
ability. However, when there are only two 
traits, the greatest variability will be among 
the individuals near the ends of the distribu- 
tion; in other words, the individuals who are 
high or low on either measure will show the 
greatest regression. The difference between 
corresponding scores is equal to (x — y) 
which also denotes trait variability. There are 
reasons for believing that when the pupils 
tested are from the same age or grade groups, 
the trait variability changes with the indi- 
vidual variability, and that the requirement 
of a constant average value for (x — y)? 
essential to Kelley’s formula is not being met. 
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To get some empirical evidence that the 
average value of (x — y)? changes as =x* 
changes, the writer computed the data given 
in Table I. The measures used were the sub- 
tests of the Stanford Achievement Test. The 
raw scores were transmuted to get a mean of 
roo and a standard deviation of 12 for each 
of the tests. The values of (x — y) for all 
pupils at various distances from the mean of 
one of the measures were then computed. 
Since the standard deviation is 12, the dis- 
tance of o—6 points includes all scores that 
are within one-half standard deviation (above 
and below) of the mean of the X test. Like- 
wise the distance of 7-12 points includes all 
scores between one-half and one standard 
deviations above and below the mean of the 
X test. 

It seems clear from this table that the 
average value of (x — y) increases as the 
individuals get farther from the mean. The 
increase in the average value of the sum of 
the squares of these differences is especially 
marked. This is important, for the magnitude 
of a correlation coefficient is dependent on the 
value of (x — y)’. 

There are reasons for believing, then, that 
when the samples are from the same age or 
grade groups, changes in individual variability 
will be accompanied by changes in trait vari- 
ability. Such a view is supported from what 
we have learned from experience about the 
influence of range upon coefficients of plus 
one, zero and minus one. 

No evidence need be submitted here that 
differences in range have no effect on correla- 
tions of plus or minus one. There may be 
some doubt that range also has no influence 
on zero coefficients. However, consider the 
case where no correlation has been found 
between the color of hair and the spelling 
ability of a group of sixth-grade children. In 
another more heterogeneous group of sixth- 
grade children we also find no correlation 
between the color of- hair and spelling ability. 
We are led to the conclusion that if there is 
no relationship between two traits, there is no 
relationship, and a mere change in range will 
not produce any. It should be noted that this 
observation is restricted to instances where 
the differences in range are due to differences 
in inherited traits or abilities and not to dif- 
ferences in growth or training which has acted 
the same way on both traits. 

For a correlation of minus one, trait vari- 
ability changes at the same rate as individual 
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TABLE I 


VARIATIONS IN THE AVERAGE VALUE ¢ OF (x — y) AND (2 — y)* AT DirFERENT DISTANCES Fro 
THE MEAN WHEN r EQUALS .47 


Distance of X score 
from mean 0-6 
N 141 


= (x—y) 


N 
=z (x—y)? 


N 


variability; for a correlation of zero, trait 
variability changes one-half as rapidly as 
individual variability; and for a correlation of 
plus one, there is no change in trait variability 
with individual variability as there is no trait 
variability. If this relationship holds for other 
coefficients, then for a correlation of .50, trait 
variability changes one-fourth as rapidly as 
individual variability. This would imply that 
the coefficient of .so (or any other value) 
would not be affected by a difference in range, 
since for a correlation of .50 the trait vari- 
ability is one-fourth of the individual vari- 
ability. The inference is that the true coeffi- 
cient between two traits in the case of chil- 
dren from the same grade groups is not 
affected by a difference in range. The observed 
coefficient for the group of narrow range will 
be somewhat lower, however, than the ob- 
served coefficient for the group of wide range 
due to the greater attenuation effects of errors 
of measurement in the group of narrow range. 
This is assuming that the children in the 
group of narrow range make errors of the 
same size as the children in the group of wide 
range. 

To recapitulate briefly: In the case of trait 
coefficients, that part of (x — y) due to 
errors of measurement does not change with 
a difference in range, that part of (x — y) 
due to a difference in inherited abilities does 
change with a difference in range. To com- 
pare two trait coefficients obtained from 
groups of different range, correct the obtained 
coefficient for the group of narrow range for 
attenuation by using the reliability coefficients 
of the narrow range, also correct the obtained 
coefficient for the group of wide range for 
attenuation by using the reliability coefficients 
of the wide range. The two coefficients cor- 
rected for attenuation may be compared 
directly. 

If the view that the true correlation be- 
tween two traits is not greatly affected by 
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differences in range is correct, then there is 
some justification for the present practice of 
comparing coefficients without paying too 
much attention to the variability or range of 
the samples involved. Also, trait coefficients 
obtained from samples consisting of similar 
age or grade groups may be averaged even 
though the variability is not constant. 


SUMMARY 


In this paper the writer has suggested that 
the correlation coefficient may be considered 
as measuring how individuals differ in two 
traits in terms of how they differ from one 
another; that the Coefficient measures trait 
variability (intra-individual differences) in 
terms of individual variability (inter-individ- 
ual differences). From this point of view, the 
following topics were discussed: the coefficient 
as a per cent, the use of intercorrelations to 
determine the per cent that trait variance is 
of individual variance, the averaging of coeffi- 
cients, and the influence of range upon the 
magnitude of coefficients. 
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